by Kim Martineau

Columbia University is the lead on a $1.25 million project funded by the National Science Foundation (NSF) to share data, tools, and ideas for tackling some of the big challenges facing the northeastern United States.

As lead agency for the Northeast Big Data Innovation Hub, one of four NSF-sponsored hubs, Columbia is bringing together experts in the public and private sector to collaborate on data-driven solutions to problems in health care, energy, finance, urbanization, natural science, and education.

Massive datasets and novel computational techniques are changing how individuals and societies approach day-to-day tasks. Data analytics promise to deliver individually tailored treatment to patients, massively reduce energy use in buildings, and radically improve teaching methods in schools, among other advances.

The Northeast is home to some of the oldest and most diverse cities in the United States, and many of the nation’s top universities, hospitals, and banks. “It’s an ideal laboratory for testing the potential for data science to improve lives,” said Northeast Hub principal investigator (PI) Kathleen McKeown, director of the Data Science Institute and Henry and Gertrude Rothschild Professor of Computer Science. “The Northeast Hub is focusing on extracting insights from large amounts of data that can bring about tangible results.”

With 40 universities, and partners in industry, government, and the nonprofit sector, Columbia is identifying high-priority needs in the region. A series of workshops over the next three years will give partners a chance to brainstorm and collaborate on projects that can bring about the greatest impact.

The Northeast Hub is tackling a number of topics including how to encourage data sharing to maximize the potential for discovery, how open data principles can be balanced against privacy and security concerns, and how cities can mine and share data to improve public services and adapt to climate change. Its six areas of focus are health, energy, cities and regions, finance, big data in education, and discovery science, and it also is addressing four overarching themes including data sharing and privacy and security.

The idea for a Big Data hub network came in 2012, after President Obama announced a $200 million National Big Data Research and Development Initiative to apply data analytics to education, environmental and biomedical research, and national security. NSF, one of six federal agencies involved, proposed an add-on initiative that would divide the country into “regional innovation hubs,” each harnessing experts in academia, industry, government, and the nonprofit sector, to address problems too big for any one to take on alone.

Planning sessions were held last fall in four regions. NSF announced leaders for the hubs—Columbia for the Northeast; Georgia Tech and North Carolina State University for the South; the University of Illinois at Urbana-Champaign for the Midwest; and the San Diego Supercomputer Center, University of California, San Diego, and the University of Washington for the West.

The Northeast Hub includes all six New England states—Maine, Vermont, New Hampshire, Massachusetts, Rhode Island, and Connecticut—as well as New York, New Jersey, and Pennsylvania. General Electric, Microsoft, and Ericsson are among 20 industry partners; New York City’s Office of Data Analytics, Brookhaven National Laboratory, and the Regional Plan Association are among 20 government and nonprofit partners.

The Hub’s executive committee is being led by McKeown, the PI; Howard Wactlar, a computer scientist at Carnegie Mellon; Carla Brodley, a computer scientist at Northeastern University; Vasant Honavar, a computer scientist at Penn State; and Andrew McCallum, a computer scientist at University of Massachusetts, Amherst. A full list of partners is available on the Northeast Hub website (

Last December, the Hub held its first workshop on campus. Speakers included Michael Leiter, a counterterrorism expert and the chief strategy officer at Leidos, a top U.S. defense contractor; and Keith Marzullo, director of the federal Networking and Information Technology Research and Development Program.