Data for Good: Q&A with Professor Jeannette M. Wing

Jeannette M. Wing
(Photo by Jeffrey Schifman)

Jeannette M. Wing joined Columbia University this spring as the Avanessians Director of the Data Science Institute and Professor of Computer Science. The Data Science Institute, launched five years ago at Columbia Engineering, was recently elevated to a University-level research center. Wing brings corporate, academic, and government perspectives to her new leadership role. Formerly, a corporate vice president with Microsoft Research, where she oversaw a global network of research labs; the head of the computer science department at Carnegie Mellon; and the assistant director for computer and information science and engineering at the National Science Foundation, Wing is an acknowledged leader in computer science. She is known for her research contributions in cybersecurity and privacy, formal methods, distributed and concurrent systems, programming languages, and software engineering. She is also a prominent voice for computational thinking, which is the application of techniques used by computer scientists to all disciplines that today influences college and K–12 curricula worldwide. This summer, as Wing settled into her new post, Columbia Engineering spoke with her about her plans for the Data Science Institute and how it felt to return to the campus she knew as a child.

Q. What attracted you to this new position as director of the Data Science Institute?

A. I am an academic at heart. At Microsoft Research, I ran five labs for four years and supported research not just in computer science, but also in biology, economics, and the social sciences. Most recently, Microsoft’s push in AI (artificial intelligence) let us take state-of-the-art AI and use it at scale to solve practical problems. To compete, we were running so fast that we didn’t have time to reflect on the underlying science for this technology. For example, we don’t understand why deep learning works, but its applicability to many human-level tasks continues to astound us. Academia’s role is to understand the why. The opportunity to return to academia, whose role and responsibility is to do long-term fundamental research, is one reason I joined Columbia. Another key reason is the Columbia Data Science Institute, which has had an impressive five years. Columbia uniquely has strengths in computer science, statistics, and optimization and thus is well-positioned to lay the foundations of the new field of data science. Moreover, at a full-fledged university like Columbia, with 12 schools participating in DSI, we can pursue the breadth of domains to which data science applies. Data is the new oil. Data will fuel the engine of creativity in every field. Every profession can benefit from data science.

Q. What perspective do you bring from your work at Microsoft?

A. At Microsoft, enterprise customers are going through a digital transformation. Every company in every sector would come to us asking how to use big data (especially machine learning) and big compute (the cloud) to make their company better, smarter. At Columbia, we train the next-generation workforce and leaders in these companies and sectors—pharmaceutical, transportation, manufacturing, finance, government, and education. I saw this great demand for talent in data science. It is a leading indicator of what skills are needed by the next generation. fair, accountable, transparent, and ethical use of data. In industry, one is often in a rush to put technology into the hands of the consumer without thinking comprehensively about its societal implications. In academia, we have the ability to look at all angles of technology—from ethics to security—before it goes into the field.

kathleen mckeown
Patricia Culligan (left), founding associate director of the Data Science Institute and Robert A. W. and Christine S. Carleton Professor of Civil Engineering, and Kathleen McKeown, founding director of the Data Science Institute and Henry and Gertrude Rothschild Professor of Computer Science. (Photo by Timothy Lee Photographers)

Building a Reputation in Data Science

The Data Science Institute, launched at Columbia Engineering, was established in July 2012 as part of New York City’s initiative to expand the City’s top-tier applied science and engineering campuses. Under the leadership of its founding director, Kathleen McKeown, the Henry and Gertrude Rothschild Professor of Computer Science, and founding associate director, Patricia Culligan, the Robert A. W. and Christine S. Carleton Professor of Civil Engineering, the Institute rapidly emerged as an intellectual leader in both foundational and interdisciplinary data science, emphasizing the disruptive power of data science to solve some of society’s pressing challenges.

In the past five years, Columbia has established its reputation as a leader in data science and the DSI as a world-class, interdisciplinary institute that is recognized both nationally and internationally among the top research centers in this field. The Institute has also emerged as a vibrant forum for discussions on important issues such as data privacy, transparency, and security.

More than 30 new faculty members—including 20 in Engineering— have been hired in cross-cutting data science research areas, while more than 200 faculty members and researchers from 11 schools across Columbia’s campus are collaborating to exploit data in areas ranging from business to medicine, social work to literature, history to policy, and engineering to the natural sciences.

The DSI has also launched a Master’s program in Data Science; a MOOC on Data Science and Analytics in Context; and a Universitywide program, Collaboratory@Columbia, created in partnership with Columbia Entrepreneurship, that brings the impact of data science education to Columbia students in a number of fields of study. Through its thriving Industry Affiliates program, the DSI has helped launch many successful startups and has deepened and extended relationships with New York City’s government.

Q. What exactly is data science?

A. Since data science is still a new, emerging field, we don’t yet have a crisp definition. At its heart, data science draws on inductive reasoning, which embraces uncertainty in its premises. Statistical modeling lets us quantify and reason about uncertainties systematically. Huge computational power enables automating such types of reasoning at scale. So, data science fundamentally draws on computer science and statistics. Moreover, many data science problems can be formulated as optimization problems. Hence, the strength at Columbia in operations research gives Columbia an edge at defining data science.

Since data is everywhere, data science is broadly applicable. All fields of science and engineering are collecting and interpreting data, and there is an increasing prevalence of data science in nontechnical fields—law, business, journalism, social sciences, humanities, art. We are at the tip of the iceberg. As big data and big compute become more commonplace as tools, there is more and more opportunity to use data for action, prediction, and decision-making.

Data science is not just about the analysis of data. It involves the whole data life cycle— generating, collecting, storing, managing, analyzing, visualizing, and interpretation, through storytelling

Q. What are your ambitions for the Data Science Institute?

A. The DSI is a unique organization with a mission that includes both research and education: advance the state of the art in data science; transform all fields, professions, and sectors through the application of data science; and ensure the responsible use of data for the benefit of society. I like to use the tagline “Data for Good” to emphasize that we want both to tackle problems relevant to society and to promote the use of data in fair and ethical ways. It’s better to take into consideration legal, social, cultural, and philosophical questions as—not after—we invent new technology. DSI has the framework to draw on the strengths at Columbia to ask big questions, beyond just doing and applying the science.

The DSI is a gem, but a hidden gem. Top priorities for me now are to enhance the visibility of our research portfolio and to foster more multidisciplinary, collaborative research. On the academic side, I’d like to launch a PhD program in data science, to create a postdoc fellows program, and to involve more undergraduates in data science, through coursework and research.

While at Microsoft, I traveled to many universities and colleges—big, small, state, private. Columbia has a five-year lead over other academic competitors. My job is to maintain the excellence that has been established in light of keen and growing competition from other schools, especially those investing heavily in data science.

In ten years, looking back, I would like to be able to say that Columbia led the way in defining the field of data science.

Q. What are some of the things you have learned in your “listening tour” of Columbia?

A. I found that faculty are sitting on amazing data sets. For example, Columbia has the largest repository of declassified government documents in the country. A history, law, or journalism professor who understands the value of this data can search for patterns, anomalies, relationships, and correlations across this data. Today, there are sensors everywhere. The Medical Center is coordinating an international effort to collect one billion health records so we can predict drug side effects at the population and patient levels. I also found faculty who are deepening our understanding of core algorithms in data science. For example, expectation maximization (EM), first formalized in 1977, is the algorithm used for estimation in statistical methods, but its theoretical properties are still not well understood. Two of our DSI faculty and a PhD student recently provided the first nontrivial global convergence analysis for EM.

Q. Cybersecurity and data privacy are two of your current research interests. Would you talk a bit about these areas?

A. These long-standing interests led me more recently to advocate for the responsible use of data. The field uses the acronym FATE for the fair, accountable, transparent, and ethical use of data. In industry, one is often in a rush to put technology into the hands of the consumer without thinking comprehensively about its societal implications. In academia, we have the ability to look at all angles of technology—from ethics to security—before it goes into the field.

Q. A corporate lab like Microsoft Research is able to take big risks. What’s your view on risktaking and how it relates to innovation?

A. You can take a big risk when you take a longterm view. Academia owns that space. NSF funds academia to do basic, open, long-term research. Microsoft Research has been like Bell Labs in its heyday, supporting such academic-style research. At Microsoft, I would remind the CEO and CFO that you would not be making certain revenues today if you hadn’t funded the research 10 or 20 years ago.

Q. Your father taught electrical engineering at Columbia for many years. What was it like growing up in that environment?

A. We lived on 121st and Amsterdam, and, as a little girl, I played on the steps of Alma Mater. My father’s office was in Mudd, when it had no other buildings inside [CSB] or next to it [CEPSR]. At MIT, I started as an EE major, but was most excited by the concepts covered in one of my required core computer science courses. Thinking about switching majors from EE to computer science, I remember calling my father up and asking, “Is computer science just a fad or is it here to stay?” He encouraged me to pursue computer science and is still one of my favorite advisers.

Q. What about living in New York again are you most looking forward to?

A. Coming from Seattle, I look forward to the four seasons again. I can walk everywhere. And I love that it is a 24-hour city—you can get anything at any time, including a good bagel!

By Joanne Hvala