Mediated and Deliberately Diminished Reality
Mediated Reality is created when virtual, or computer generated, information is
mixed with what what the user would otherwise normally see. Eye Tap is
suitable for creating a mediated reality because it is able to absorb,
quantify, and resynthesize the light the user would normally see. When the
light is resynthesized under computer control, information can be added or
removed from the scene before it is presented to the user. The virtual
information or light as seen through the display must be properly registered
and aligned within the user's field of view. To achieve this, a method of
camera-based head-tracking is now described.
Why camera-based head-tracking?
A goal of personal imaging is to facilitate the use of
Personal Imaging [1]
systems in ordinary everyday situations, not just on a factory assembly
line ``workcell'', or other restricted space. Thus it is desired that the
apparatus have a head-tracker that need not rely on any special apparatus
being installed in the environment.
Therefore, a new method of head-tracking based on the use
of the camera capability of the apparatus is needed
[1]
and is based on the VideoOrbits algorithm
[2].
The VideoOrbits algorithm performs head-tracking, visually, based on a natural
environment, and works without the need for object recognition. Instead it is
based on algebraic projective geometry, and a featureless means of estimating
the change in spatial coordinates arising from movement of the wearer's
head, as illustrated in Figure 3.
Figure 1:
The `VideoOrbits' head-tracking algorithm:
The new head-tracking algorithm requires no special
devices installed in the environment. The camera in the
Personal Imaging system simply tracks itself based on its
view of objects in the environment. The algorithm is based
on algebraic projective geometry, and provides an estimate of
the true projective coordinate transformation, which, for
successive image pairs is composed using the projective
group [2].
Successive pairs of images may be estimated in the neighbourhood
of the identity coordinate transformation
of the group, while absolute head tracking is done using the
exact group by relating the approximate parameters
q to the
exact parameters
p in the innermost loop of the process.
The algorithm typically runs at 5-10 frames per second on a
general-purpose computer but the simple structure of the
algorithm makes it easy to implement in hardware for
the higher frame-rates needed for full-motion video.
|
Algebraic projective geometry
Direct featureless methods are used for estimating the 8 parameters of an
``exact'' projective (homographic) coordinate transformation to
register pairs of images or scene content.
The approach is ``exact'' for two cases of static scenes:
(1) images taken from the same location of an arbitrary 3-D scene,
with a camera that is free to pan, tilt, rotate about its optical axis,
and zoom or
(2)
images of a flat scene taken from arbitrary locations.
The featureless projective
approach generalizes inter-frame camera motion estimation methods
which have previously used an affine model
(which lacks the degrees of freedom
to ``exactly'' characterize such phenomena as camera pan and tilt)
and/or which have
relied upon finding points of correspondence between the image frames.
For instance, the affine model cannot capture camera pan and tilt,
and therefore cannot properly
express the ``keystoning'' and ``chirping''
we see in the real world. ``Chirping'' refers to the effect of increasing
or decreasing spatial frequency with respect to spatial location,
as illustrated in Fig 4.
Figure 2:
The `projective chirping' phenomenon.
(a) A real-world object that exhibits periodicity
generates a projection (image) with ``chirping'' --
`periodicity-in-perspective'.
(b) Center raster of image.
(c) Best-fit projective chirp of form
.
(d) Graphical depiction of exemplar 1-D projective
coordinate transformation
of
into a `projective chirp' function,
.
The range coordinate as a function
of the domain coordinate forms a rectangular hyperbola
with asymptotes shifted to center at the vanishing point
and `exploding point',
, and with
`chirpiness'
.
 |
This chirping phenomenon is implicit in the proposed system,
whether or not there is periodicity in the subject matter.
The only requirement is that there be some distinct texture upon a
flat surface in the scene.
The featureless projective approach,
which operates directly on the image
pixels, is shown to be superior in accuracy and ability to enhance
resolution.
The proposed methods work well on image data collected
from both good-quality and poor-quality video under a wide variety of
conditions (sunny, cloudy, day, night).
These fully-automatic
methods are also shown to be robust to deviations from the
assumptions of static scene and no parallax, although the primary
application is in filtering out or modifying subject matter appearing
on flat surfaces within a scene (e.g. rigid planar patches such
as advertising billboards).
Tsai and Huang [3] pointed out that the elements of the
projective group give the true camera motions with respect to a
planar surface. They explored the group structure associated with
images of a 3-D rigid planar patch, as well as the associated Lie
algebra, although they assume that the correspondence problem has
been solved. The solution presented in this paper (which does not
require prior solution of correspondence) also relies on projective
group theory.
The brightness constancy constraint equation for 2-D
images [4] which gives the flow velocity components in
the x and y directions is:
As is well-known [4]
the optical flow field in 2-D is underconstrained. The model of pure
translation at every point has two parameters, but there is only one
it is common practice to compute the optical flow over some
neighborhood, which must be at least two pixels, but is generally
taken over a small block,
,
, or
sometimes larger (e.g. the entire patch of subject matter to be
filtered out, such as a billboard or sign).
Our task is not to deal with the 2-D translational flow, but with the 2-D
projective flow, estimating the eight parameters in the coordinate
transformation:
The desired eight scalar
parameters are denoted by
,
,
, and
.
We have, in the 2-D case:
Where the sum can be weighted as it was in the 1-D case:
Differentiating with respect to the free
parameters
,
and setting the result to zero gives a linear solution:
where
For a more in depth treatment of projective flow, the reader is invited
to refer to [5].
The mathematical framework for mediated reality arose through the process
of marking a reference frame[2] with text or simple
graphics, where it was noted that by calculating and matching homographies
of the plane, an illusory rigid planar patch appeared to hover upon
objects in the real-world, giving rise to a form of computer-mediated
collaboration[1].
This collaborative capability was suggested as an application of HI to the
visually challenged, or those with a visual memory
disability[2]. In this application, a computer program, or
remote expert (be it human or machine) may assist in way finding, or by
providing a photographic/videographic memory, such as the ability to never
forget a face. (See Fig 3.)
Figure 3:
Mediated reality as a photographic/videographic memory
prosthesis:
(a) Wearable face-recognizer with virtual ``name tag'' (and
grocery list) appears to stay attached to the cashier
(b), even when the cashier is no longer within the field of view
of the tapped eye and transmitter (c).
 |
Figure 4 show images taken with an Eye Tap system and depict the use of
the reality mediator as a form of communication. Figure 4(a) is an unmediated
image of a roadside advertisement. Figures 4(b),(c),(d) show images from
the same sequence as seen through the reality mediator.
The motion of the planar patch
of the advertisement has been tracked by the Video Orbits algorithm and the
advertisement is replaced by a message intended to help guide the user to their
destination.
Figure 4:
(a) Billboards, advertising,
and other visual detritus form annoying, and sometimes
dangerous clutter at the sides of busy roadways and highways.
This advertisement, made in the shape of an octagon, and painted red,
and placed at the side of a busy road, is the visual equivalent of
yelling ``fire'' in a crowded theatre in order to get everyone's
attention to tell them you have something for sale.
(b),(c),(d) Successive frames of video processed by the Eye Tap system using the VideoOrbits planetracker. The advertisement is filtered out,
to reduce visual clutter in the scene. In its place is a useful
message that can help the user of the Eye Tap system keep their
attention on the road, and on not getting lost.
When the rigid planar patch is not sufficiently within the
visual field of view, approximate tracking still works based on other
planar patches present in the scene.
 |
 |
-
- 1
-
Steve Mann.
Wearable computing: A first step toward personal imaging.
IEEE Computer; http://wearcam.org/ieeecomputer.htm,
30(2):25-32, Feb 1997.
- 2
-
S. Mann and R. W. Picard.
Video orbits of the projective group; a simple approach to
featureless estimation of parameters.
TR 338, Massachusetts Institute of Technology, Cambridge,
Massachusetts, See http://hi.eecg.toronto.edu/tip.html 1995.
Also appears in IEEE Trans. Image Proc., Sept 1997, Vol. 6 No. 9,
p. 1281-1295.
- 3
-
R. Y. Tsai and T. S. Huang.
Estimating Three-Dimensional Motion Parameters of a Rigid Planar
Patch I.
IEEE Trans. Accoust., Speech, and Sig. Proc.,
ASSP(29):1147-1152, December 1981.
- 4
-
B. Horn and B. Schunk.
Determining Optical Flow.
Artificial Intelligence, 17:185-203, 1981.
- 5
-
Steve Mann.
Humanistic intelligence/humanistic computing: `wearcomp' as a new
framework for intelligent signal processing.
Proceedings of the IEEE, 86(11):2123-2151+cover, Nov 1998.
http://wearcam.org/procieee.htm.
- 6
-
Steve Mann.
Wearable, tetherless computer-mediated reality: WearCam as a
wearable face-recognizer, and other applications for the disabled.
TR 361, M.I.T. Media Lab Perceptual Computing Section; Also appears
in AAAI Fall Symposium on Developing Assistive Technology for People
with Disabilities, 9-11 November 1996, MIT; http://wearcam.org/vmp.htm,
Cambridge, Massachusetts, February 2 1996.
Previous Articles
University of Toronto ECE1766 Web Productions
Back Home
-
Contact us