Compositing, or Modularity of Media

["The Language of New Media" is a new book by Lev Manovich,
Rhizome regional editor and a frequent Rhizome contributor. The
book will appear from The MIT Press in the second part of 2000.
Over the next few months, Rhizome will present selections from
the book, including this excerpt entitled "Compositing, or
Modularity of Media."]

The movie Wag the Dog–Barry Levinson, 1997–contains an scene in which
a Washington spin doctor and a Hollywood producer are editing a fake
news footage designed to win public support for the non-existent war.
The footage shows a girl, a cat in her arms, running through the
destroyed village. If a few decades earlier creating together such a
shot required staging and then filming the whole thing on location, the
computer tools make it possible to create it in real time. Now, the only
live element is the girl, played by a professional actress. The actress
is videotaped against a blue screen. The other two elements in the shot,
the destroyed village and the car, come from the database of stock
footage. Scanning through the database, the producers trying different
versions of these elements; a computer updates the composite scene in
real time. The logic of this shot is typical of new media production
process, regardless of whether the object being put together is a video
or film shot, as in Wag the Dog example; a 2D still image; a sound
track; a 3D virtual environment; a computer game scene; or a sound
track. In the course of production, some elements are created
specifically for the project; others are selected from databases of
stock material. Once all the elements are ready, they are composited
together into a single object. That is, they are fitted together and
adjusted in a such a way that their separate identities become
invisible. The fact that they come diverse sources and were created by
different people in different times is hidden. The result is a single
seamless image, sound, space or a scene.

As used in new media field, the term digital compositing has a
particular and well-defined meaning. It refers to the process of
combining a number of moving image sequences and possibly stills into a
single sequence with the help of special compositing software such as
After Effects–Adobe–, Compositor–Alias|Wavefront–, or Cineon–Kodak–.
Compositing was formally defined in a paper published in 1984 by two
scientists working for Lucasfilm. In describing compositing they make a
significant analogy with computer programming:

"Experience has taught us to break down large bodies of source code into
separate modules in order to save compilation time. An error in one
routine forces only the recompilation of its module and the relatively
quick reloading of the entire program. Similarly, small errors in
coloration or design in one object should not force 'recompilation' of
the entire image."

Separating the image into elements which can be independently rendered
saves enormous time. Each element has an associated matte, coverage
information which designates the shape of the element. The compositing
of those elements makes use of the mattes to accumulate the final image.

Digital compositing exemplifies a more general operation of computer
culture: assembling together a number of elements to create a single
seamless object. Thus we can distinguish between compositing in wider
sense–i.e., the general operation–and compositing in a narrow sense
–assembling movie image elements to create a photorealistic shot–. The
latter meaning corresponds to the accepted usage of the term
compositing. For me, compositing in a narrow sense is a particular case
of a more general operation of compositing–a typical operation in
assembling any new media object.

As a general operation, compositing is a counterpart of selection from a
menu or a database. Since a typical new media object is put together
from elements which come from different sources, these elements need to
be coordinated and adjusted to fit together. Although the logic of these
two operations–selection and compositing–may suggest that they
always follow one another–first select, then composite–, in practice
their relationship is more interactive. Once an object is partially
assembled, new elements may need to be added; existing elements may need
to be re-worked. This interactivity is made possible by modular
organization of a new media object on different scales. Throughout the
production process, the elements retain their separate identity and
therefore they can be easily modified, substituted or deleted. When the
object is complete, it can be "output" as a single "stream" in which
separate elements no longer are accessible. The example of the operation
which "collapses" all elements together is "flatten image" command in
Adobe Photoshop 5.0. Another example of "collapsing" elements into a
single stream is recording a digitally composited moving image sequence
on film, which was a typical procedure in Hollywood film production in
the 1980s and 1990s.

Alternatively, the completed object may retain the modular structure
when it is distributed. For instance, in computer games the player can
interactively control characters, moving them in space. In some games,
the user moves 2D images of characters, called sprites, over the
background image; in others, everything is represented as 3D objects,
including the characters. In either case, during production the elements
are adjusted to form a single whole, stylistically, spatially and
semantically; during the play the user can move the elements within the
programmed limits.

In general, 3D computer graphics representation is more "progressive"
than a 2D image because it allows true independence of elements;
therefore it may gradually replace image streams, still used by our
culture: photographs, 2D drawings, films, video. In other words, 3D
computer graphics representation is more modular than 2D still image or
2D moving image stream. This modularity makes it easier for a designer
to modify the scene at any time. It also gives the scene additional
functionality. For instance, the user may "control" the character,
moving him or her around the 3D space. Scene elements can be later
reused for new productions. Finally, modularity also allows for a more
efficient storage and transmission of a media object. For example, to
transmit a video clip over a network all pixels which make up this clip
have to be send over; but to transmit a 3D scene only requires sending
the coordinates of the objects in it. This is how online virtual
worlds, online computer games and networked military simulators work:
first the copies of all objects making up a world are downloaded to a
user computer, and after this the server only has to keep sending their
new 3D coordinates.

If the general trajectory of computer culture is from 2D images towards
3D computer graphics representations, digital compositing represents an
intermediary historical step between the two. A composited space which
consists from a number of moving image layers is more modular than a
single shot of a physical space. The layers can be repositioned against
each other and adjusted separately. Yet such a representation is not as
modular as a true 3D virtual space, because each of the layers retains
its own perspective. When and were moving image "streams" will be
replaced by %100 3D computer generated scenes will depend not only on
cultural acceptance of computer scene's look but also on economics. A 3D
scene is much more functional than a film or video shot of the same
scene but, if it is to contain similar level of detail, it may be much
more expensive to generate. The general evolution of all media types
towards becoming more and more modular, and the particular evolution of
a moving image in the same direction, can be traced through the history
of popular media file formats. QuickTime developers early on specified
that a single QuickTime movie may consist from a number of separate
tracks, just as a still Photoshop image consists from a number of
layers. QuickTime 4 format–1999–included 11 different track types,
including video track, sound track, text track and sprite track–graphic
objects which can be moved independently of video–. By placing different
media on different tracks which can be edited and exported
independently, QuickTime encourages the designers to think in modular
terms. In addition, a movie may contain a number of video tracks which
can act as layers in a digital composite. By using alpha channels–masks
saved with video tracks–and different modes of track interaction–such
as partial transparency–, QuickTime user can create complex compositing
effects within a single QuickTime movie, without having to resort to any
special compositing software. In effect, QuickTime architects embedded
the practice of digital compositing in the media format itself. What
previously required special software now can be done by simply using the
features of QuickTime format itself.

Another example of media format evolving towards more and more data
modularity is MPEG. The early version of the format such as MPEG-1
–1992–was defined as "the standard for storage and retrieval of moving
pictures and audio on storage media." The format specified a compression
scheme for a video and/or audio data conceptualized in a traditional
way. In contrast, MPEG-7–to be approved in 2001–is defined as "the
content representation standard for multimedia information search,
filtering, management and processing." It is based on different concept
of a media composition which consist from a number of a media objects of
various types, from video and audio to 3D models and facial expressions,
and the information on how these objects are combined. MPEG-7 provides
an abstract language to describe such a scene. The evolution of MPEG
thus allows us to trace the conceptual evolution in how we understand
new media–from a traditional "stream" to a modular composition, more
similar in its logic to a structural computer program than a traditional
image or a film.

It is ironic that while today "media streaming" is a hot topic, the
actual historical trajectory of computer media is away from image
streams towards modular media. But we will have to wait and see when
this new logic of media will affect not only media formats–such a
QuickTime and MPEG–but the aesthetics of media culture as well.