New Columbia Symposium Charts the Frontiers in Computing Systems
The term “big data” doesn’t begin to describe the almost unimaginable volume of data generated around the world every day, from companies tracking consumer behaviors to satellites measuring environmental changes to scientists modeling the physics of Earth and space. So enormous are the constantly generated data sets that conventional paradigms of computing are rapidly becoming unworkable.
To highlight the promising advances and grand challenges of creating the information infrastructure of the future, Columbia’s new Frontiers in Computing Systems working group, part of the Data Science Institute, convened its inaugural symposium on March 24.
The full-day event brought together more than 150 leading scientists, researchers, and engineers to discuss massive and extreme-scale parallel computing systems and their application to solving diverse interdisciplinary problems across fields such as ocean science, materials science, genomics, neuroscience, and population-scale biomedical informatics. The participants included representatives from technology companies, banking and financial institutions, and government. They were welcomed by Columbia Executive Vice President for Research Michael Purdy and Dean of Engineering Mary C. Boyce.
“The volume of data in a wide range of cutting-edge research fields is approaching exascale levels (a billion gigabytes), and is overwhelming the ability of computing systems to process, store and analyze,” said Professor Steven Nowick, chair of the Frontiers in Computing Systems group. “A key challenge, to enable many 21st century societal advances, is to get this ‘fire hose’ under control.”
The event’s keynote speaker, Ruchir Puri, shared an example of the challenges facing those who work with big data. Puri is an IBM fellow and chief architect of IBM’s Watson, a cognitive computer system capable of reasoning, learning, and understanding context that has defeated the best human Jeopardy players. Watson is so powerful that it has to be cloud-based, Puri explained, because it would be prohibitively difficult to move enough computing power to different premises, and the cloud also provides the opportunity to dynamically scale resources to actual needs.
“Computing is coming out of an era of transaction processes and business automation to a more dimensional future of understanding the world in addition to automating it,” Puri said. “Traditional computing uses very precise deterministic models, but we are heading to cognitive and perceptual machine learning, with noisier stochastic models that are fundamentally approximate and statistical in nature.”
Columbia’s Frontiers in Computing group is exploring several big data challenges, including how immense amounts of data will be shared and processed in the future.
Peter Wang, CTO and co-founder of Continuum Analytics, suggested that the “data revolution” is becoming more of an “everything revolution” as individual central processing units (CPUs) in computers begin discovering a ceiling to Moore’s Law—that the number of transistors that can fit onto integrated circuits doubles approximately every two years—because at some point using them all will melt the chip.
“The old CPU model is fading,” Wang said. “Increasingly computers will not be individual nodes but containers connecting users to cloud computing.”
Building distributed memory systems to supersede overburdened single node systems is the focus of Peter Kogge, founder of Emu Technology and a professor at Notre Dame. Single domain setups can be up to 1,000 times more efficient than those using shared memory but don’t scale well, he noted, so the challenge is how to better execute huge numbers of tasks in many places simultaneously, integrating worker threads that do the bulk of computation with others responsible for coordinating the system.
“We can’t just get a better CPU; we have to look at the whole system architecture,” Kogge said.
Speakers at the conference also discussed how cutting-edge computing is being applied for research in climate science, astrophysics, and molecular dynamics.
Gavin Schmidt, director of the NASA Goddard Institute for Space Studies, described the complex models of Earth’s climate flows that incorporate planetary physics, historic greenhouse gas levels, and various scenarios of what might unfold over the next century. On an even broader scale, Rutgers astrophysicist Rachel Somerville explained how a wealth of images across different wavelengths reveal insights about distant galaxies, contributing to rich data sets that feed enormous high-resolution simulations of our universe back to the Big Bang, including stars, supernovae, black holes, and vast clouds of interstellar gas.
Closer to home, D.E. Shaw Research’s Head of Engineering Mark Moraes described how high-performance computation is being used to transform biochemistry research and the process of drug discovery by simulating how compounds interact with proteins and target molecules over millisecond timescales.
Kyle Mandli, assistant professor of applied mathematics and part of the working group, rounded out the afternoon moderating a panel of expert practitioners and leaders who approach advanced computing from additional perspectives, including bioinformatics and materials science.
The Frontiers in Computing Systems group is a highly interdisciplinary research initiative founded last year bringing together over 30 experts in systems, applications, and more.
“Our group is unusually cross-disciplinary—it pulls together researchers across the university, from engineering, physical and biological sciences, neuroscience, the medical school, and Lamont-Doherty Earth Observatory,” Nowick said. “Interesting and unexpected breakthroughs come when computer systems and diverse applications researchers come together and collaborate. I am excited about the future impact of this systems research on many big data domains.”
— By Jesse Adams