Luis Gravano


706 Schapiro CEPSR

Tel(212) 939-7064
Fax(212) 666-0140

Luis Gravano’s research has focused on extracting structured information—interpreted broadly—from natural-language text and social media documents. This extraction process enables richer, deeper access to the information embedded in the documents than what would be otherwise possible over the unstructured content. Gravano’s research also identifies opportunities for exploiting the wealth of information that is hidden or implicit in social media streams. This information can be invaluable for many applications if it is extracted and aggregated in a reliable, robust manner.

Research Interests

Databases, information retrieval, web search, information extraction, social media mining.

Research Areas

Of particular interest to Gravano are scalable extraction and analysis techniques that can handle the vast volumes of information available over the web and on social media. Because information extraction is a time-consuming process (often involving complex text analysis) and also an error-prone process (often producing noisy or incorrect output) Gravano has studied cost-based optimization approaches for information extraction, where both execution efficiency and output quality are modeled explicitly. He has also addressed the task of selecting appropriate social media content for real-world events, to help guide applications such as event browsing and search.

In collaboration with the New York City Department of Health and Mental Hygiene, Gravano is also investigating how to process social media content to identify serious events (e.g., a foodborne illness outbreak originating in a restaurant) that the public health authorities should act upon. Because of the interdisciplinary nature of his research, Gravano has worked closely with natural language processing and machine learning researchers, as well as with his public health collaborators.

Gravano received his B.S. degree from the Escuela Superior Latinoamericana de Informática (ESLAI), Argentina, in 1991, his MS degree from Stanford in 1994, and a PhD in computer science from Stanford in 1997. He is a recipient of an NSF CAREER award and has received multiple best paper awards, including at the ACM SIGMOD 2006 and IEEE ICDE 2005 conferences.


  • Senior research scientist, Google (on leave from Columbia University), 2001
  • Consulting researcher, Microsoft Research, 1999, 2000, 2002
  • Academic consultant, Google, 2000


  • Professor of computer science, Columbia University, 2013 - 
  • Associate professor of computer science, Columbia University, 2002-2013
  • Assistant professor of computer science, Columbia University, 1997-2002


  • Association for Computing Machinery


  • Distinguished Faculty Teaching Award, Columbia Engineering Alumni Association, Columbia University, 2012
  • Distinguished Teacher Award, Computer Science Department, Columbia University, 2011
  • Best Paper Award, 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD 2006), 2006
  • Best Paper Award, 21st IEEE International Conference on Data Engineering (ICDE 2005), 2005
  • CAREER Award, National Science Foundation (NSF), 1998
  • Most Original Paper Award, International Conference on Parallel Processing (ICPP '92), 1992


  • National Science Foundation, “III: Medium: Adaptive Information Extraction from Social Media for Actionable Inferences in Public Health,” IIS-15-63785, with Daniel Hsu (CoPI), 2016-2020


  • “Sampling Strategies for Information Extraction over the Deep Web,” P. Barrio and L. Gravano, in Information Processing & Management, vol. 53, no. 2, pages 309–331, Mar. 2017.
  • “k-Shape: Efficient and Accurate Clustering of Time Series,” J. Paparrizos and L. Gravano, in Proc. of the 2015 ACM SIGMOD International Conference on Management of Data, 2015.
  • “When Speed Has a Price: Fast Information Extraction Using Approximate Algorithms,” G. Simões, H. Galhardas, and L. Gravano, in Proc. of the VLDB Endowment, vol. 6, no. 13, pages 1462-1473, 2013.
  • “Identifying Content for Planned Events Across Social Media Sites,” H. Becker, D. Iter, M. Naaman, and L. Gravano, in Proc. of the 2012 ACM International Conference on Web Search and Data Mining (WSDM 2012), pages 533-542, 2012.
  • “Answering General Time-Sensitive Queries,” W. Dakka, L. Gravano, and P. Ipeirotis, in IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 2, pages 220-235, Feb. 2012.
  • “Hip and Trendy: Characterizing Emerging Trends on Twitter,” M. Naaman, H. Becker, and L. Gravano, in Journal of the American Society for Information Science and Technology, vol. 62, no. 5, pages 902–918, May 2011.
  • “Classification-Aware Hidden-Web Text Database Selection,” P. Ipeirotis and L. Gravano, in ACM Transactions on Information Systems, vol. 26, no. 2, art. 6, Mar. 2008.
  • “Towards a Query Optimizer for Text-Centric Tasks,” P. Ipeirotis, E. Agichtein, P. Jain, and L. Gravano, in ACM Transactions on Database Systems, vol. 32, no. 4, art. 21, Nov. 2007.
  • “Modeling and Managing Changes in Text Databases,” P. Ipeirotis, A. Ntoulas, J. Cho, and L. Gravano, in ACM Transactions on Database Systems, vol. 32, no. 3, art. 14, Aug. 2007.
  • “Evaluating Top-k Queries over Web-Accessible Databases,” A. Marian, N. Bruno, and L. Gravano, in ACM Transactions on Database Systems, vol. 29, no. 2, pages 319-362, June 2004.