On display at The New Museum until September 30th, is the exhibition Ghosts in the Machine. Curated by Massimiliano Gioni and Gary Carrion-Murayari, the exhibition is described as having been “conceived as an encyclopedic cabinet of wonders: bringing together an array of artworks and non-art objects to create an unsystematic archive of man’s attempt to reconcile the organic and the mechanical.”1 Of the myriad works presented in the exhibition, there is one humble object that in so many ways embodies the complex history of technical abstraction, and the externalization of that which is inherently human. This object is called the VODER.
Short for Voice Operation DEmonstratoR, the VODER was an instrument or tool that provided its operator the ability to synthesize human speech. It easily predates the first cases of computerized speech synthesis, and represents the distinct end of an era for a particular type of metonymic device, along with the beginning of a whole other era of synthesized speech. The year was 1929. As the story goes, Bell Labs researcher Homer Dudley experienced an epiphanic moment, while laying in a hospital bed.
A pioneering researcher of voice communications technologies, Dudley was working to develop more efficient methods of voice transmission that could make better use of the Bell System’s bandwidth. His eureka moment was the realization that the human mechanisms of speech (the vocal cords, mouth, teeth, tongue and lips), resembled the mechanics of radio transmission2: the vocal chords create high-frequency vibrations that serve essentially as a carrier wave to the data encoded by the articulations of the mouth. He would go on to spearhead the development of technology that enabled the invention of a device called the Vocoder3. By breaking speech down into ten low frequency bands, the Vocoder was able to send transmissions requiring far less bandwidth than the full spectral information produced by the telephone. By the mid-30s the team at Bell Labs had developed these technologies to successful ends, but would not see implementation outside of the lab for another decade or so.
It was this initial work on the Vocoder that led Dudley down a winding path toward the VODER. The key distinction between the Vocoder and VODER is that while the Vocoder was a tool through which to process speech, the VODER was a instrument with which one could synthesize speech. The Vocoder required its operator to only turn a few knobs, and speak into a microphone. The VODER was an instrument in a wholly other sense, providing fourteen keys, a bar controlled by the operator's wrist, and a foot pedal. The Voder was not spoken to – it was performed, or played. The operator's speech impulses would bypass their destination of the vocal cords and mouth, instead manifesting themselves through their hands, wrist and foot, and finally through the manipulation of the VODER’s controls. Complex combinations of keys would produce the requisite components of speech that a given letter, word, and sentence is composed of. The foot pedal controlled pitch, providing the essential subtle variations of intonation. The resultant sounds approached that of modern speech synthesis. Computers would not meet the expressive abilities of the VODER for another twenty years.
“…in producing the word ‘concentration’ on the VODER, I have to form thirteen different sounds in succession, make five up and down movements of the wrist bar, and vary the position of the foot pedal three to five times, according to what expression I want the VODER to give the word, and of course all this must be done with exactly correct timing.”4
The history embedded in the VODER is truly a crossroads. It represents a moment in time where we were learning what machines sounded like when they spoke. The VODER certainly has precedence, with inventions such as the Euphonia (1835, Joseph Faber). The Euphonia was in many ways similar to the VODER in that it was a human operated instrument that attempted to reproduce the sounds of human speech. Due however to the culture in which the Euphonia was deployed, its reception was one of horror at best. The apparatus produced speech that was inexpressive and raspy - closer to a death rattle - and emanated from a prosthetic face. Pre-dating the public debut of Bell’s telephone by forty one years, the Euphonia (much like the telephone) was perceived as inhuman, soulless, and downright creepy5. Nearly a century later, with a world accustomed to disembodied speech, it is unsurprising that a speaking machine would receive a warmer welcome.
The moment of the VODER’s debut at the 1939 World’s Fair can not be overstated in its significance. Here was the modern world hearing the voice of the machine for the first time. Prior to this point, any portrayal of speaking machines, automatons, robots, were pure speculation with scant evidence as an aural basis. Incredibly, Flushing Meadows-Corona Park paid witness to the collision, or passing of these two moments of pre-synthetic speech and post-synthetic speech. The Westinghouse company held a demonstration of their ELEKTRO robot. A hulking, hardly functional novelty act, this automaton carried on a conversation with great wit, and even enjoyed a cigarette on stage. There was no VODER / ELEKTRO collaboration, and with the exhibition predating formant speech synthesis6 by over ten years, it comes as no surprise that the voice of this machine was merely a man with a microphone, speaking awkwardly (a charade with presedence tracing back to ancient Greece7). This strongly highlights the metonymic crossroads embodied by the VODER. Were ELEKTRO to have debuted post-VODER, it is arguable that it would have been voiced differently.
Once the true voice of the machine had entered the public consciousness, it’s place and form in fictional portrayal would never be the same. After that day in 1939, we knew specifically how inhuman machined speech should sound. As the years pass beyond 1939, and we see advancements in speech synthesis, eventually leading to the first speaking computers. As technologies bring realities into being, they dictate the boundaries of what must be imagined. By 1961 the first speaking and singing computer, the IBM 704 has emerged. The circuitry that afforded the IBM machine to sing was a form of Vocoder technology.
While in decades prior, the mere notion of a seemingly sentient computer would be sufficiently disturbing, this boundary too needed renegotiation. It follows naturally, that in 1968 when crafting the cadence of HAL, Kubrick came to the decision that flawless, impeccable speech was a more stirring possibility than the stiff sounding computers of the day. It is no coincidence either, that HAL sings the same song that was first sung by the IBM 704, as Arthur C. Clarke witnessed the machine performing the song while visiting Bell Labs in 19628.
The VODER truly ushered in the golden age of speech synthesis, and expanded the imagination of an era, yet it never lived to see any practical use. What exactly drove Homer Dudley to develop it? Perhaps it is best seen as an artifact representative of a perspective in contrast to the critique, caution, and even paranoia of technology embodied by some of the works in Ghosts in the Machine. It represents exploration for mere innovation – the curiosity of what may lie beyond the boundary of our current reality. It represents a desire to speak to our inventions, and the hope that they might talk back.
↟ 1. newmuseum.org/exhibitions/view/ghosts-in-the-machine
↟ 2. Fully articulated eleven years later in The Bell System Technical Journal, Volume 19, No. 4, 1940, pp. 495
↟ 3. In his book “How to Wreck a Nice Beach”, author Dave Tompkins fully extrapolates the history of the Vocoder, tracking its evolution from cryptographic speech transmission, to its implementation in musical instruments, and subsequent adoption by generations of musicians.
↟ 4. Excerpt from VODER demonstration
↟ 6. http://en.wikipedia.org/wiki/Speech_synthesis#Formant_synthesis
↟ 7. Hankins, Thomas L. and Robert J. Silverman. "Vox Mechanica: The History of the Speaking Machine." Instruments and the Imagination. p. 178-220.