[Home] [Technology] [Cyborgs] [Fundamentals] [Updates]

Mediated and Deliberately Diminished Reality

Mediated Reality is created when virtual, or computer generated, information is mixed with what what the user would otherwise normally see. Eye Tap is suitable for creating a mediated reality because it is able to absorb, quantify, and resynthesize the light the user would normally see. When the light is resynthesized under computer control, information can be added or removed from the scene before it is presented to the user. The virtual information or light as seen through the display must be properly registered and aligned within the user's field of view. To achieve this, a method of camera-based head-tracking is now described.

Why camera-based head-tracking?

A goal of personal imaging is to facilitate the use of Personal Imaging [1] systems in ordinary everyday situations, not just on a factory assembly line ``workcell'', or other restricted space. Thus it is desired that the apparatus have a head-tracker that need not rely on any special apparatus being installed in the environment.

Therefore, a new method of head-tracking based on the use of the camera capability of the apparatus is needed [1] and is based on the VideoOrbits algorithm [2]. The VideoOrbits algorithm performs head-tracking, visually, based on a natural environment, and works without the need for object recognition. Instead it is based on algebraic projective geometry, and a featureless means of estimating the change in spatial coordinates arising from movement of the wearer's head, as illustrated in Figure 3.

Figure 1: The `VideoOrbits' head-tracking algorithm: The new head-tracking algorithm requires no special devices installed in the environment. The camera in the Personal Imaging system simply tracks itself based on its view of objects in the environment. The algorithm is based on algebraic projective geometry, and provides an estimate of the true projective coordinate transformation, which, for successive image pairs is composed using the projective group [2]. Successive pairs of images may be estimated in the neighbourhood of the identity coordinate transformation of the group, while absolute head tracking is done using the exact group by relating the approximate parameters q to the exact parameters p in the innermost loop of the process. The algorithm typically runs at 5-10 frames per second on a general-purpose computer but the simple structure of the algorithm makes it easy to implement in hardware for the higher frame-rates needed for full-motion video.
Flowchart of Video Orbits Algorithm

Algebraic projective geometry

Direct featureless methods are used for estimating the 8 parameters of an ``exact'' projective (homographic) coordinate transformation to register pairs of images or scene content. The approach is ``exact'' for two cases of static scenes: (1) images taken from the same location of an arbitrary 3-D scene, with a camera that is free to pan, tilt, rotate about its optical axis, and zoom or (2) images of a flat scene taken from arbitrary locations. The featureless projective approach generalizes inter-frame camera motion estimation methods which have previously used an affine model (which lacks the degrees of freedom to ``exactly'' characterize such phenomena as camera pan and tilt) and/or which have relied upon finding points of correspondence between the image frames. For instance, the affine model cannot capture camera pan and tilt, and therefore cannot properly express the ``keystoning'' and ``chirping'' we see in the real world. ``Chirping'' refers to the effect of increasing or decreasing spatial frequency with respect to spatial location, as illustrated in Fig 4.

Figure 2: The `projective chirping' phenomenon. (a) A real-world object that exhibits periodicity generates a projection (image) with ``chirping'' -- `periodicity-in-perspective'. (b) Center raster of image. (c) Best-fit projective chirp of form $\sin(2\pi((ax+b)/(cx+1)))$. (d) Graphical depiction of exemplar 1-D projective coordinate transformation of $\sin(2\pi x_1)$ into a `projective chirp' function, $\sin(2\pi x_2) = \sin(2\pi((2x_1-2)/(x_1+1)))$. The range coordinate as a function of the domain coordinate forms a rectangular hyperbola with asymptotes shifted to center at the vanishing point $x_1=-1/c=-1$ and `exploding point', $x_2=a/c=2$, and with `chirpiness' $c'=c^2/(bc-a)=-1/4$.
\begin{figure*}\figlrd
{3in}{project/red_with_matching_chirp.epsi,width=2.5in}
{3in}{project/coord_remap45.epsi,width=2.5in}
\end{figure*}

This chirping phenomenon is implicit in the proposed system, whether or not there is periodicity in the subject matter. The only requirement is that there be some distinct texture upon a flat surface in the scene. The featureless projective approach, which operates directly on the image pixels, is shown to be superior in accuracy and ability to enhance resolution. The proposed methods work well on image data collected from both good-quality and poor-quality video under a wide variety of conditions (sunny, cloudy, day, night). These fully-automatic methods are also shown to be robust to deviations from the assumptions of static scene and no parallax, although the primary application is in filtering out or modifying subject matter appearing on flat surfaces within a scene (e.g. rigid planar patches such as advertising billboards).

Video orbits

Tsai and Huang [3] pointed out that the elements of the projective group give the true camera motions with respect to a planar surface. They explored the group structure associated with images of a 3-D rigid planar patch, as well as the associated Lie algebra, although they assume that the correspondence problem has been solved. The solution presented in this paper (which does not require prior solution of correspondence) also relies on projective group theory.

Planetracker in 2-D

The brightness constancy constraint equation for 2-D images [4] which gives the flow velocity components in the x and y directions is:

As is well-known [4] the optical flow field in 2-D is underconstrained. The model of pure translation at every point has two parameters, but there is only one it is common practice to compute the optical flow over some neighborhood, which must be at least two pixels, but is generally taken over a small block, $3\times 3$, $5 \times 5$, or sometimes larger (e.g. the entire patch of subject matter to be filtered out, such as a billboard or sign).

Our task is not to deal with the 2-D translational flow, but with the 2-D projective flow, estimating the eight parameters in the coordinate transformation:

The desired eight scalar parameters are denoted by ${\bf p} = [{\bf A}, {\bf b}; {\bf c}, 1]$, ${\bf A} \in \R^{2\times 2}$, ${\bf b} \in
\R^{2\times 1}$, and ${\bf c} \in \R^{2 \times 1}$.

We have, in the 2-D case:

$\displaystyle \varepsilon_{flow}=\sum\left({\bf u_m}^T{\bf E_x} + E_t\right)^2
=$      
$\displaystyle \sum\left((\frac{{\bf Ax}+{\bf b}}{{\bf c}^T{\bf x}+1}-{\bf x})^T{\bf E_x}+E_t\right)^2$    

Where the sum can be weighted as it was in the 1-D case:

Differentiating with respect to the free parameters ${\bf A}, {\bf b}, \mbox{and } {\bf c}$, and setting the result to zero gives a linear solution:
$\displaystyle \left( \sum \phi \phi^T \right) [a_{11},a_{12},b_1,a_{21},a_{22},b_2,c_1,c_2]^
T
=$      
$\displaystyle \sum({\bf x}^T{\bf E_x}-E_t)\phi$    

where $\phi^T=[E_x(x,y,1),E_y(x,y,1),xE_t-x^2E_x-xyE_y,yE_t-xyE_x-y^2E_y]$
For a more in depth treatment of projective flow, the reader is invited to refer to [5].

Mediated reality as a form of communication

The mathematical framework for mediated reality arose through the process of marking a reference frame[2] with text or simple graphics, where it was noted that by calculating and matching homographies of the plane, an illusory rigid planar patch appeared to hover upon objects in the real-world, giving rise to a form of computer-mediated collaboration[1].

This collaborative capability was suggested as an application of HI to the visually challenged, or those with a visual memory disability[2]. In this application, a computer program, or remote expert (be it human or machine) may assist in way finding, or by providing a photographic/videographic memory, such as the ability to never forget a face. (See Fig 3.)

Figure 3: Mediated reality as a photographic/videographic memory prosthesis: (a) Wearable face-recognizer with virtual ``name tag'' (and grocery list) appears to stay attached to the cashier (b), even when the cashier is no longer within the field of view of the tapped eye and transmitter (c).
\begin{figure*}\figlcrabc{2.25in}{netcam/walmart186label.eps,width=2.2in}
{2.25...
...ps,width=2.2in}
{2.25in}{netcam/walmart166label.eps,width=2.2in}
\end{figure*}

Figure 4 show images taken with an Eye Tap system and depict the use of the reality mediator as a form of communication. Figure 4(a) is an unmediated image of a roadside advertisement. Figures 4(b),(c),(d) show images from the same sequence as seen through the reality mediator. The motion of the planar patch of the advertisement has been tracked by the Video Orbits algorithm and the advertisement is replaced by a message intended to help guide the user to their destination.

Figure 4: (a) Billboards, advertising, and other visual detritus form annoying, and sometimes dangerous clutter at the sides of busy roadways and highways. This advertisement, made in the shape of an octagon, and painted red, and placed at the side of a busy road, is the visual equivalent of yelling ``fire'' in a crowded theatre in order to get everyone's attention to tell them you have something for sale. (b),(c),(d) Successive frames of video processed by the Eye Tap system using the VideoOrbits planetracker. The advertisement is filtered out, to reduce visual clutter in the scene. In its place is a useful message that can help the user of the Eye Tap system keep their attention on the road, and on not getting lost. When the rigid planar patch is not sufficiently within the visual field of view, approximate tracking still works based on other planar patches present in the scene.
\begin{figure*}\figccccabcd{1.85in}{v0087r.ps,width=1.8in}
{1.85in}{cv0001r.ps,...
...{1.85in}{cv0085r.ps,width=1.8in}
{1.85in}{cv0142r.ps,width=1.8in}
\end{figure*}
\begin{figure*}\figccccabcd{1.85in}{v0087r.ps,width=1.8in}
{1.85in}{cv0001r.ps,...
...{1.85in}{cv0085r.ps,width=1.8in}
{1.85in}{cv0142r.ps,width=1.8in}
\end{figure*}

Bibliography

1
Steve Mann.
Wearable computing: A first step toward personal imaging.
IEEE Computer; http://wearcam.org/ieeecomputer.htm, 30(2):25-32, Feb 1997.

2
S. Mann and R. W. Picard.
Video orbits of the projective group; a simple approach to featureless estimation of parameters.
TR 338, Massachusetts Institute of Technology, Cambridge, Massachusetts, See http://hi.eecg.toronto.edu/tip.html 1995.
Also appears in IEEE Trans. Image Proc., Sept 1997, Vol. 6 No. 9, p. 1281-1295.

3
R. Y. Tsai and T. S. Huang.
Estimating Three-Dimensional Motion Parameters of a Rigid Planar Patch I.
IEEE Trans. Accoust., Speech, and Sig. Proc., ASSP(29):1147-1152, December 1981.

4
B. Horn and B. Schunk.
Determining Optical Flow.
Artificial Intelligence, 17:185-203, 1981.

5
Steve Mann.
Humanistic intelligence/humanistic computing: `wearcomp' as a new framework for intelligent signal processing.
Proceedings of the IEEE, 86(11):2123-2151+cover, Nov 1998.
http://wearcam.org/procieee.htm.

6
Steve Mann.
Wearable, tetherless computer-mediated reality: WearCam as a wearable face-recognizer, and other applications for the disabled.
TR 361, M.I.T. Media Lab Perceptual Computing Section; Also appears in AAAI Fall Symposium on Developing Assistive Technology for People with Disabilities, 9-11 November 1996, MIT; http://wearcam.org/vmp.htm, Cambridge, Massachusetts, February 2 1996.

Previous Articles


University of Toronto ECE1766 Web Productions  
Back Home  -  Contact us