Dan Ellis
Sensing & Imaging
The Sound of Science

Go shopping with a friend and you will be bombarded by a cacophony of sounds: your shared dialogue, the chatter of other shoppers, the rhythmic hum of in-store music, and even the buzzes and beeps of cell phone alerts. These represent just a handful of acoustic signals your brain automatically processes for you. Is it possible that machines could learn this intuitive process?

That is a question for Dan Ellis, professor of electrical engineering and founder of the Laboratory for Recognition and Organization of Speech and Audio (LabROSA) at Columbia Engineering. As the leader of the nation’s only lab to combine research in speech recognition, music processing, signal separation, and content-based retrieval for sound processing in machines, Ellis is making a lot of noise. His work in soundtrack classification pioneered the idea of using statistical classification of audio data for general classification of videos by their soundtracks. Now he is leading a group of researchers in investigating how to create an intelligent machine listener able to interpret live or recorded sound of any type in terms of the descriptions and abstractions that would make sense to a human listener.

Log-frequency spectrograms of speech give the computer a view of sound modeled on what we know about the information carried from the ear to the brain. It’s also a good domain for manipulating and modifying sounds in ways that make sense to listeners.

“We’ve performed work in supporting speech recognition in noisy environments, which has obvious commercial applications in things like better voice-control systems or searching soundtrack databases for particular utterances,” Ellis says. “But we’re also very interested in other kinds of sounds, which, in general, have been neglected by research in favor of speech.”

Led by Ellis, recent research at LabROSA has included the classification of videos based on soundtracks. Instead of categorizing videos based on speech, this research allows machines to extract information based on sounds present, which is useful in the categorical organization of large collections of consumer-style videos.

“This means we’ll be able to search for unspoken audio similarly to how we search for spoken audio,” Ellis explains. “As society gathers more and more raw media recordings and demands easier, more effective retrieval, I see a lot of potential for commercial applications of such technology.”

With the development of very powerful machine learning techniques like deep neural networks—sets of algorithms in machine learning used to model complex abstractions in data—it has become necessary to access significant volumes of data. That is why Ellis and his team at LabROSA are currently developing new techniques to accelerate the process of classifying those data troves.

Rock N' Roll Engineer
Robert Moog

Before Robert Moog MS’56 (1934–2005) revolutionized electronic music in the 1960s, synthesizers were bulky, expensive, and intimidating machines. Largely dependent on vacuum tubes, early synthesizers required extensive customization to generate different sounds and were used primarily by experts.

A lifelong enthusiast of the Theremin, an early and eerie-toned electronic instrument controlled by using one’s hands to manipulate electromagnetic fields around metal rods, Moog (rhymes with “vogue”) developed his own designs while attending the Bronx High School of Science. He was still a teenager, studying physics at Queens College, when he founded his first company to sell his advanced Theremins.

At Columbia Engineering, Moog studied electrical engineering and saw the potential of transistors, then still an emerging technology, to transform and popularize electronic music. He graduated shortly before the Columbia-Princeton Electronic Music Center (today known as the Computer Music Center) installed an enormous state-of-the-art RCA Mark II Synthesizer on campus but kept close tabs on his alma mater’s progress in the field.

In the early 1960s, Moog worked with composer Herbert Deutsch to develop the first modular voltage-controlled subtractive synthesizer, played via keyboard. He demonstrated prototypes at the October 1964 convention of the Audio Engineering Society, stunning attendees with his instruments’ unprecedented portability and ease of use. Within a few short years, Moog synthesizers could be heard in music ranging from The Beatles’ Abbey Road and Simon & Garfunkel’s Bookends to Wendy Carlos’s ’65GSAS best-selling Switched-On Bach album, which reimagined Johann Sebastian Bach on Moog’s creation.

Working closely with musicians, for whom he considered himself a “toolmaker,” Moog never stopped refining his instruments, including the even more accessible Minimoog and a series of influential guitar pedals. Although the popularity of his analog synthesizers dipped in the ’80s with the mass emergence of digital alternatives, they are now celebrated and coveted for their classic sound. By the time of his death in 2005, Moog’s legend and legacy were stronger than ever.

Motivated by an intrinsic interest in improving our understanding of audio—especially music—and how it varies, Ellis is largely focused on developing a series of new software libraries and annotated data that are expected to make a particularly significant impact on the music audio research community. “We’re excited by the prospect of helping people organize and manage their personal music collections, and helping people discover new music based on their listening preferences,” Ellis explains.

The origin of Ellis’s passion for music and electronics can be traced back to his childhood experiences. “While attending a music-centric school in England, I took lessons in piano, harp, bassoon, and percussion. I also took up electronics as a hobby, and I was particularly fascinated when a friend showed me his electronic synthesizer,” Ellis recalls. “I remember it very clearly, and for me, it presented the ideal intersection between the musical sounds that I loved and electronics technology.”

Although music and engineering may seem like an odd pairing, for Ellis, they share more in common than what meets the eye—or ear.

“I think the whole notion of technical research as being independent from the kind of creative exploration we expect from artists is a serious mistake,” he asserts. “The essence of research is identifying and exploring new ideas that have been overlooked, or coming up with novel and more powerful solutions.” To do that, Ellis applies an interdisciplinary focus.

“Before coming to Columbia, I hadn’t considered the value of departments beyond engineering,” he admits. “But I find myself regularly collaborating with colleagues from Columbia’s Sound Arts program and with researchers in the College of Physicians and Surgeons. Plus, being in New York has afforded me opportunities to interact with organizations like Google and Spotify. Together, we are contributing to the city’s positioning as a mecca for new sound-processing technology.”

—by Dave Meyers