Michael Collins | A Man of Many Words
Vikram S. Pandit Professor of Computer Science
This profile is included in the publication Excellentia, which features current research of Columbia Engineering faculty members.
Photo by Eileen Barroso
With people becoming ever more connected around the globe, statistical natural language processing (NLP), a sub-field of artificial intelligence, has become a critical area of research—as the amount of electronic data increases exponentially, so does the need for translating, analyzing, and managing the flood of words and text on the web. NLP deals with the interactions between computers and human languages, often using machine learning to approach problems in processing text or speech. One of the world’s leading NLP researchers has just come to Columbia Engineering from MIT: Michael J. Collins, recently named the Vikram S. Pandit Professor of Computer Science, whose work in machine learning and computational linguistics has been extraordinarily influential.
Collins’ research focuses on algorithms that process text to make sense of the vast amount of text available in electronic form on the web. The overarching thrust of his work has been the use of machine learning along with linguistic methods to handle difficult problems in language processing. His research falls into three main areas: parsing, machine learning methods, and applications.
Collins has built a parser that can obtain such unprecedented accuracy levels that it has revolutionized the field of NLP: for the first time, a system was able to accurately handle enormous quantities of text in electronic form. His parser is now one of the most widely used software tools in the NLP field.
His development of new learning algorithms has enabled him to make significant advances in several language-processing applications, greatly impacting speech recognition, information extraction, and machine translation.
“A major focus of my work is on statistical models of complex linguistic structures,” said Collins. “The challenge is to combine sophisticated machine learning methods with these complex structures.”
Collins’ research also focuses on “efficient search” in statistical models of language, an important challenge in many NLP applications. For example, in parsing, you have to search through a vast set of “possible” structures for a given sentence, in order to find the most probable structure. In translation, you need to search through a vast number of possible translations for the most plausible structure; in speech recognition, you must search through a vast number of possible sentences for the most likely sentence that was spoken.
“I find linguistics fascinating,” said Collins. “I really enjoy developing mathematical models for languages. And the algorithms we’re developing to process text in intelligent ways have all kinds of intriguing applications.”
B.A., University of Cambridge (England), 1992; M.Phil., University of Cambridge, 1993; Ph.D., University of Pennsylvania, 1999