Big Data, Big Ideas

Apr 11 2013 | By Melanie A. Farmer | Photos: Ryan Lee

Columbia faculty and leading data scientists and researchers underscored the ever-increasing challenges and opportunities tied to big data at a symposium held April 5 by Columbia’s new Institute for Data Sciences and Engineering. The all-day event featured keynotes by Eric Horvitz, managing co-director at Microsoft Research Redmond, and Lawrence D. Burns, former longtime vice president at General Motors and director of the Program on Sustainable Mobility at the Earth Institute.

The industry panel featured speakers from leading technology giants. From left to right:  Ben Fried, chief information officer at Google; Jennifer Tour Chayes, distinguished scientist and managing director of Microsoft Research New England and Microsoft Research NYC; Shawn Edwards, chief technology officer at Bloomberg; and Justin Moore, engineer at Facebook. Sharing the stage was panel moderator, Kyle Kimball, executive director of New York City Economic Development Corporation.

Institute Director Kathleen McKeown announced four key industry partnerships with Bloomberg LP, Google, Mediaocean, and Microsoft, as well as its new certification program for data sciences, now in the review process. The new program, which comprises four core courses in data sciences, is the first step in the Institute’s goal to create a master’s degree program and, ultimately, a PhD program in data sciences.

More than 350 guests attended the event in Low Rotunda and some 400 tuned in via webcast. Topics discussed at the “Big Data to Big Ideas” symposium included the potential for technology innovation in data sciences, a boom in job opportunities for those with data-science skills, and the University’s interdisciplinary approach in addressing this exciting field.

“We were thrilled that the symposium captured such a large audience, both with in-person attendees and those who watched the live webcast and commented via Twitter and Facebook,” said McKeown, who is also the Henry and Gertrude Rothschild Professor of Computer Science at the Engineering School. “The keynotes and panelists really brought data sciences to life, covering the fascinating facets of this important field in far-reaching discussions of its current and future challenges and solutions. The q-and-a sessions were very lively and it was exciting to see faculty, industry leaders, students, and our partners come together to focus on the urgent need to understand data sciences now, and how we, at Columbia, are taking a broad interdisciplinary approach to developing solutions that will have an impact on our daily lives. We look forward to working with our industry partners as we move ahead.”

In opening remarks, G. Michael Purdy, executive vice president for research at Columbia, said that big data “will dramatically change the way we view the world around us,” including the way new tools and technologies will be developed going forward and how people will engage with one another. These ideas of change in conducting business, in technology, in education, and in research were constant themes mentioned throughout the day’s program.

In his keynote, Horvitz echoed the excitement surrounding big data, and particularly new opportunities in technology innovation. He presented several examples of what Microsoft Research is doing in the area, including analyzing mobile communications and building predictive models based on user data. He also explained the value of citizen science—nonprofessional scientists contributing to scientific research—and its potential to work hand in hand with machine learning techniques to better analyze large datasets.

The data collecting and modeling that is being done now “is still very much in their infancy,” said Horvitz. “Three or four decades from now, we’ll reflect on earlier work [and think] wasn’t that quaint.”

Lawrence Burns, who is also a professor of engineering practice at the University of Michigan’s School of Engineering, focused his talk on mobility, centering on innovations that are transforming the automotive sector. Before joining Columbia, he spent over a decade as vice president of research and development for General Motors and has long championed the reinvention of the automobile. He gave a thoughtful overview of the major transformations made to “personal mobility,” such as autonomous vehicles that drive themselves, shared cars, crash avoidance—all technologies and innovations “driven by data sciences and engineering.”

The symposium also featured two panel discussions focused on the Institute’s industry partners and the topic of data visualization. During the data visualization panel, Shih-Fu Chang, Richard Dicker Professor of Telecommunications and senior vice dean at the Engineering School, showed photos and images posted on social media sites like Facebook, demonstrating how one image can convey emotion. There is a need, he explained, to make sense of visual sentiments from social media, but asked, “How do we organize this in a meaningful structure?” This is just one of several key areas of research the Institute’s New Media Center will cover.

In his presentation, Mark Hansen, professor of journalism at the School of Journalism and chair of the New Media Center, showed how “data can be a tremendous source of creativity and a tremendous source of story-telling.” One example he gave was a public art installation he helped create at Walter Cronkite Plaza at the University of Texas, Austin. A live feed of text extracted from local television news is projected across the square on to a University building. Hansen said such new media artwork reminds us “that data can function in a creative fashion as well.”

Patricia Culligan, associate director of the Institute and professor of civil engineering and engineering mechanics, introduced the Institute’s Center chairs who gave brief presentations, and she invited the audience to attend a poster session that gave attendees an overview of exciting research currently being conducted in the six centers: Cybersecurity, Financial Analytics, Foundations of Data Science, Health Analytics, New Media, and Smart Cities.

In the industry panel session featuring Bloomberg, Facebook, Google, and Microsoft Research, panelists stressed the value of the interdisciplinary nature of the University-wide Institute and its centers as well as its advantage in being located in New York City, where the technology start-up community is rapidly growing.

“Columbia is uniquely positioned to provide the cross pollination that our industry needs,” said Shawn Edwards BS’90, MS’95, chief technology officer at Bloomberg who stressed that data science is not just a computer science issue but reaches across many different disciplines.

Jennifer Tour Chayes of Microsoft Research agreed. “Really what Columbia is bringing in is their understanding of so many different domains.”

Panelists agreed the field of data science is burgeoning, and Columbia’s new Institute is in a good position to bring leadership to data sciences and the City’s tech start-up world.

Google CIO Ben Fried shared his envy of the Institute’s future graduates. “They are moving into a world where there’s so much more need than availability,” said Fried who stressed that he’s lucky to work at a company that values data, citing that Google already employs data scientists in sales, in finance, and in human resources. “Data scientists are already contributing to the company … To those future graduates, your skills will be much valued.”

Institute Director Kathleen McKeown chats at the poster session with Shawn Edwards, an Engineering alumnus and CTO at Bloomberg, one of the Institute's latest industry partners.

At the symposium's poster session, Raimondo Betti (right), chair of the Department of Civil Engineering and Engineering Mechanics, represents the Institute's Smart Cities Center and talks about some of the exciting research already underway. Also pictured, Orin Herskowitz (left), executive director and vice president of Columbia Technology Ventures.

Seats filled up fast in Low Rotunda for the symposium's all-day program on big data.