Skip to Main Content

The brain surgeon began as he always does, making an incision in the scalp and gently spreading it apart to expose the skull. He then drilled a 3-inch circular opening through the bone, down to the thick, tough covering called the dura. He sliced through that, and there in the little porthole he’d made was the glistening, blood-flecked, pewter-colored brain, ready for him to approach the way spies do a foreign embassy: He bugged it.

Dr. Ashesh Mehta, a neurosurgeon at the Feinstein Institute for Medical Research on Long Island, was operating on his epilepsy patient to determine the source of seizures. But the patient agreed to something more: to be part of an audacious experiment whose ultimate goal is to translate thoughts into speech.

advertisement

While he was in there, Mehta carefully placed a flat array of microelectrodes on the left side of the brain’s surface, over areas involved in both listening to and formulating speech. By eavesdropping on the electrical impulses that crackle through the gray matter when a person hears in the “mind’s ear” what words he intends to articulate (often so quickly it’s barely conscious), then transmitting those signals wirelessly to a computer that decodes them, the electrodes and the rest of the system hold the promise of being the first “brain-computer interface” to go beyond movement and sensation.

If all goes well, it will conquer the field’s Everest: developing a brain-computer interface that could enable people with a spinal cord injury, locked-in syndrome, ALS, or other paralyzing condition to talk again.

The technology needn’t give these patients the ability to deliver a Shakespeare soliloquy. More and more experts therefore think a system that decodes whether a person is silently saying yes or no or hungry or pain or water is now within reach, thanks to parallel advances in neuroscience, engineering, and machine learning.

“We think we’re getting enough of an understanding of the brain signals that encode silent speech that we could soon make something practical,” said Brian Pasley of the University of California, Berkeley. “Even something modest could be meaningful to patients. I’m convinced it’s possible.”

advertisement

Further in the future, Facebook and others envision similar technology facilitating consumer products that translate thoughts into text messages and emails. No typing or Siri necessary.

The first brain-computer interfaces (BCI) read electrical signals in the motor cortex corresponding to the intention to move, and use software to translate the signals into instructions to operate a computer cursor or robotic arm. In 2016, scientists at the University of Pittsburgh went a step further, adding sensors to a mind-controlled robotic arm so it produced sensations of touch.

For all the enthusiastic media coverage they get, brain-computer interfaces are neither routine nor even widely available more than a decade after the first prototypes. Many of the projects floundered after initial excitement. Most such systems require clunky cables as well as large boxes packed with signal analyzers and other electronics, said Pitt’s Jennifer Collinger, who helped develop the tactile robotic arm. She and her colleagues recently got an $8 million grant from the National Institutes of Health to give it to additional patients at Pittsburgh and keep improving the device.

In addition, today’s brain electrodes last only a few years, meaning people would need repeated brain surgery, and current BCI systems, while OK in a lab, aren’t reliable enough for real-world use, Collinger said.

Speech BCI faces even higher hurdles. Decoding the intention to articulate a word involves reading more brain signals than movement, and it hasn’t been clear precisely which areas of the brain are involved. The main challenge is that language is encoded in an extensive brain network, and current recording techniques can’t monitor the whole brain with high enough spatial and temporal resolution, said Stephanie Martin of the University of Geneva, who last year won an award for her progress toward a speech BCI.

The brain is also very noisy, and the electrical activity that encodes speech tends to get drowned out by other signals. “That makes it hard to extract the speech patterns with a high accuracy,” she said.

Current assistive technologies for people whose paralysis, ALS, or other condition leaves them unable to speak “are not very natural and intuitive,” said Martin, who is part of a European consortium on decoding speech from brain activity. Patients gaze at a screen displaying letters, scalp electrodes sense brain waves that encode eye movement and position, and the chosen letters spell words that a speech synthesizer says aloud. The late cosmologist Stephen Hawking, who had ALS, used a system like this. But scientists think they can do better by “directly exploiting neural correlates of speech,” Martin said.

Computational neuroscientist Frank Guenther of Boston University developed the first speech BCI way back in 2007. It used electrodes implanted in the brain of a man with locked-in syndrome to eavesdrop on the motor cortex’s plans to speak. They picked up signals corresponding to moving the tongue, lips, larynx, jaw, and cheeks in a way that would produce particular phonemes (though the study got only as far as vowels).

The project ended after Guenther’s collaborator, neurologist Phil Kennedy, ran afoul of federal health regulators and was barred from implanting electrodes in any more patients. It didn’t help that Kennedy, frustrated with the field’s slow progress, had his own brain implanted with electrodes, a power coil, and transceiver by a neurosurgeon in Belize in 2014, and initially seemed to have suffered brain damage.

Undeterred by such reputational setbacks, other neuroscientists are teaming up with electrical engineers to develop a system of implants, decoders, and speech synthesizers that would read a patient’s intended words, as encoded in brain signals, and turn them into audible speech. One aspect of speech BCI that could one day make their use widespread, Guenther said: The hardware is much less expensive than robot arms, which can cost hundreds of thousands of dollars.

Guenther’s 2007 system, he said, “was ancient by today’s standards. I don’t think the problems [that have held back speech BCI] are unsolvable.”

Neither does electrical engineer Nima Mesgarani of Columbia University, who is leading a project to reconstruct speech from the signals picked up by electrodes like those Mehta implanted.

The reason such a device has a prayer of working is that the human brain doesn’t make hard-and-fast distinctions between fantasy and reality. When the brain imagines something, the neuronal activity is extremely similar, in location and pattern, to when it does something. The mental image of a pumpkin pie produces activity in the visual cortex very much like that while seeing one; imagining taking a jump shot evokes neuronal activity like actually executing one.

So it is with “covert” or silent speech: Rehearsing what you’re going to say without moving the lips or tongue “creates the same patterns of brain activity as actually speaking,” Mesgarani said.

So does mentally listening to your silent speech. “Think of it as the mind’s ear,” said Berkeley’s Pasley. Say the word giraffe. Then say it silently. Inside your brain, the second syllable should sound louder than the first, and probably rises in pitch. Those and other qualities make up the word’s spectrogram, Pasley explained.

Crucially, brain activity corresponding to the mind’s ear occurs in the auditory cortex, which also hears sounds from the outside world: The overlap, Pasley and his colleagues report in a paper in next month’s Cerebral Cortex, “is substantial.”

That allows eavesdropping devices to reconstruct silent speech, if far from perfectly. In a study Martin conducted with Pasley when she was at Berkeley, participants who had electrodes placed in their brains were asked to think about saying aloud a series of words such as cowboys, swimming, python, and telephone. Unfortunately, the accuracy of the software’s interpretation of the resulting brain signals for word pairs such as spoon and battlefield was only slightly better than flipping a coin. That was a big improvement, though, on an earlier system that scored under 40 percent in figuring out what vowel or consonant, not even a whole word, was encoded by brain activity during covert speech.

The Berkeley results were good enough for a proof of concept, but not much more. “The reconstructed speech [from that and similar studies] hasn’t been intelligible at all,” Mesgarani said. “We’re trying to overcome the intelligibility barrier.”

The best way to do that, he said, is with machine learning, or training software to interpret the brain activity corresponding to covert speech, learn from its mistakes, and get progressively better.

To test his ideas, Mesgarani teamed up with Mehta, who recruited five epilepsy patients for the study. During their surgeries, he placed a grid of electrodes (the flat array is called electrocorticography) on the surface of two regions of the auditory cortex: over Heschl’s gyrus and the superior temporal gyrus. The latter contains Wernicke’s area, which figures out what words to use. Both gyruses process features of speech, including volume, intonation, frequency, and, crucially, phonemes — the smallest units of sound, such as “sh,” that comprise a spoken language.

The volunteers then listened to people saying digits (“one, two, three …”) and reading stories for 30 minutes. Acoustic processing software extracted the neural activity evoked by listening to speech, essentially a sequence of complex electrical signals. A “deep neural network” that Mesgarani and his team developed to, basically, infer the language sounds that correspond to the neural activity then analyzed that activity. Those inferences were translated back into electrical signals. Those were sent to a vocoder, a synthesizer that produces sounds from features of electrical signals such as frequency and other auditory elements.

The whole process was like translating the operating manual for a Ferrari from Italian to English to Japanese and back to Italian: The final version often sounds nothing like the original. And that’s what previous research on brain-computer interfaces for speech had gotten: a string of mostly unintelligible sounds. “Before this, you couldn’t reconstruct the sounds of speech from electrical data very well,” Mesgarani said.

The test for his brain-computer interface was whether the tinny sounds coming from the vocoder bore any resemblance to the sounds of the stories and the digits the participants had heard. They did: Intelligibility reached 75 percent, compared to slightly more than half that for earlier speech BCIs, the scientists report in a paper posted on the bioRxiv preprint site; it has not been peer-reviewed but the authors have submitted it to a journal.

Averaging all of someone’s neural responses to a particular speech utterance (repeated many times) improved the accuracy of the reconstructed, synthesized speech, as did taking readings from more electrodes of the 128 in the array.

The next step is to test the deep neural network on brain signals evoked by imagining speaking, Mesgarani said. “Previous studies have shown it’s possible” to detect the signals that encode such unspoken speech, he said; the bottleneck has been in the acoustic processing and verbal synthesizer.

By improving the back end of a potential speech BCI, he said, “we have a good framework for producing accurate and intelligible reconstructed speech from brain activity,” something he calls “a step toward the next generation of human-computer interaction systems … for patients suffering from paralysis and locked-in syndromes.”

What begins as technology for people with disabilities could spread to everyone else — or maybe go in reverse. At a 2017 neurotech conference at the Massachusetts Institute of Technology, Facebook’s Mark Chevillet described the company’s “thought to typing” BCI research as being guided by one question: “What if you could type directly from your brain?”

The goal of the project, which he directs, is “to develop a silent speech interface that will let you produce text five times faster than typing, or 100 words per minute.” The company is studying whether high-quality neural signals detected noninvasively (not even the most ardent Facebook-ers would likely agree to brain surgery) can be accurately decoded into phonemes. If so, the next step is to feed the signals into a database that pairs phoneme sequences with words, then use language-specific probability data to predict which word the signals most likely mean (much like auto-fill in Gmail).

“This is not science fiction,” Chevillet told the conference.

STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect

To submit a correction request, please visit our Contact Us page.