Yaniv Erlich | Dissecting the Complex Relationships of Genes, Health, and Privacy

Health and predisposition to disease strongly depend on the genetic material carried deep within the cells of every individual. Increasingly this genetic information, unique to every individual, is becoming public. For Yaniv Erlich, it's two parts of the complex relationship between genes and health, one increasingly understood through quantitative analysis and computer algorithms.

Yaniv Erlich
Yaniv Erlich
Assistant Professor of Computer Science
—Photo by Jared Leeds

"Computational methods are necessary at every step in examining the genome," says Erlich, assistant professor of computer science. "The strings of nucleotides (A, T, G, and C) that form each person's unique genome are billions of letters long. It's not even possible to look at these long sequences without a computer. As more genetic data is collected, concepts from machine learning, statistics, and signal processing are needed to detect the subtle variations and statistical patterns that reveal traits and predisposition to disease."

Erlich is well-positioned for untangling the complex genetic underpinnings of human biology. An initial love of math and a friend's chance remark that biology entailed a lot of mathematics piqued his interest, and he went on to study both biology and genetics, two sciences increasingly awash in data and in need of algorithms for inferring information from that data. He approached both subjects from a computational perspective.

At MIT's Whitehead Institute of Biomedical Research, where he headed a research lab from 2010 through 2014, he had the chance to develop and then apply new computational tools to genetics. Results were impressive. One method sequenced tens of thousands of samples at a time. Another harnessed signal processing and statistical learning to extract genetic information from short tandem repeats, or STRs, a fast-mutating fragment of DNA so small it had been mostly neglected by the research community. Both methods contributed new information about how genes operate at the molecular level.

But Erlich was also interested in how genes affect health and traits. Is the relationship linear where mutations predictably sum to a trait, or is the relationship nonlinear, with mutations interacting unpredictably with one another? The answer required a population-scale analysis, one too large to be constructed using traditional data collection methods. Here, Erlich and his Whitehead colleagues came up with an innovative solution. Using genealogical data already uploaded to social media sites, they created a genealogical tree of 13 million individuals dating back to the 15th century. Such a deep genealogy reveals clusters of genetic variation, with some tied to longevity and some to rare disorders. (And offering as a side bonus an intriguing view into human migration.) Looking at how mutations ripple through populations, researchers can measure the frequency of a certain trait, and thus evaluate the genetic contribution. With a larger tree, even more will become possible.

The promise is great but it all means nothing if there isn't genetic material to work with. And that requires trust.

People today are more wary about revealing personal data, and they are right to be so. Erlich himself is one of the first to flag the ethical complexities involving genetics and privacy. A paper he spearheaded, released in January 2013, caused a stir by showing how easy it is to take apparently anonymized genetic information donated by research participants and cross-reference it to online data sources. Using only Internet searches with no actual DNA, Erlich and his research team were able to correlate the donated DNA to a surname in 13% of the U.S. population, a result that surprised even Erlich.

"Our study highlighted current gaps in genetic privacy as we enter to the brave new era of ubiquitous genetic information," explains Erlich. "However, we must remember that sharing genetic information is crucial to understanding the hereditary basis of devastating disorders. We were pleased to see that our work has helped facilitate discussions and procedures to better share genetic information in ways that respect participants' preferences."

Erlich hopes that ensuring better safeguarding of genetic information will encourage more people to contribute their DNA, speeding the day when personalized medicine becomes a reality for all. For those most at risk of rare genetic diseases, the bigger danger may be in not contributing genetic information.

BSc, Tel-Aviv University, 2006; PhD, Watson School of Biological Sciences, Cold Spring Harbor Laboratory, 2010
 
—by Linda Crane
500 W. 120th St., Mudd 510, New York, NY 10027    212-854-2993