Skip to main content

Psychiatry When You Don't Speak the Language

An afternoon in a Chinese clinic makes it clear how important a patient's speech can be to making a diagnosis

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American


“It’s been an exciting morning—we got a new patient last night with acute mania,” the resident said with a wry smile. It was my first day in Changsha, China. I was at the Second Xiangya Hospital and frankly, I began to tense up wondering what “exciting” meant. Some words translate poorly.

We stood outside the Female Psychiatric Ward. As the resident dug through her white coat pocket, I studied the door—a piece of sheet metal on hinges—that barricaded the main stairwell from the patients. Whoever built it had an eye for function; not much else. It rattled with commotion from the other side.

In a moment, the hinges yawned open to a tempestuous sea of Pepto-Bismol-pink jackets with quilted flower prints, evidently standard-issue on the Ward.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


The resident secured the door behind us while I stood in the middle of the hallway. I towered over eight inches above the patients. My white coat, white skin, and wide-eyed expression startled them as much as they startled me.

As we exchanged puzzled looks, the sea calmed. The hall went silent.

“My god, they’re acting like you’re Jesus Christ!” came a shout in fluent English. Unsure whether I had hallucinated, my eyes darted around the hallway. Finally, the voice stepped from the background and I was staring at Jennifer:

“Look, I’m here onmyownfreewill my father emotionallyphysicallyverbally A-BUS-ES ME and has been druggingMEtogetmebetter his friend is a doctor and saysthisisgoingtohelp I’m not crazy OK?? I just want to talk to you for five minutes.”

I am not a particularly adept diagnostician, but I realized that Jennifer was our patient suffering from mania right around the time she got to “emotionallyphysicallyverbally.” Her speech was rapid; her words pressed together as if they couldn’t come out fast enough.

Jennifer’s rumpled, tortured expression projected the urgency of someone who lived in nightmare, of someone whose brain was in crisis. Her hands curved and careened through the air as she paced the hallway shouting unpleasantries in Chinese to whomever blocked her path.

During the month I was in China, Jennifer was my one English-speaking patient, uniquely searing that moment in my mind. That brief exchange revealed a wealth of information about what was going on in her brain, about what was tormenting her—if only I knew what to do with it!

My other patient experiences were equally drenched in data. I watched carefully as Dr. Hao Wei, Director of W.H.O.’s Collaborating Center for Drug Abuse and Health, evaluated and treated 44 patients in one three-hour clinic session. That’s super-human speed.

Here, my lack of Chinese excluded me from the conversation and forced me to focus on how the patient walked into the room, their posture and body language as they sat (or not) in the bright orange examination chair, how they looked (or not) at Dr. Wei, their facial expression, how much they spoke, for how long, and how their speech sounded.

Each of these data points is included in the “mental status exam,” a bedrock tool in Psychiatric clinical assessment. Observations of the patient’s body language, speech, and expression are combined with the patient’s answers to questions like “How’s your mood?” or “What’s on your mind?” The mental status exam provides a clinical framework for thinking about the patient’s condition and assessing how best to help.

Like most frameworks, the mental status exam is only as useful as the clinical information it contains. One critique of the mental status exam is its inherent subjectivity—what seems like “fast speech” to one clinician may not seem as fast to another. What seems like appropriate body language in one culture might be totally bizarre in another context. And there’s the problem of describing in prose what you observe with your senses: not everyone’s a Hemingway.

During each of the forty-four patient visits, I sat in the corner of the humid examination room as a clinical peeping Tom, feverishly scribbling character sketches for conversations I couldn’t understand. As I reviewed each sketch later that night, I realized I had recorded a huge amount of diagnostic information—essentially a window into each person’s brain. I also realized that whatever information I captured in my sloppy prose would likely differ from another clinician’s and, more troubling, did little to directly help me understand another patient.

Considering my patient experiences as a whole, I wished I’d recorded behavioral measurements instead of character sketches. After all, trying to pin down brain function in prose is like describing liver function as hues of jaundice. There’s a reason we quantify liver function with laboratory values, quantifiable tests are more powerful than words.

By measuring a patient’s speech, we can say it has a speed of 200 words per minute, a volume of 70 decibels, and a pitch of 180Hz—which is much more useful and reliable than simply describing it as “fast.” Having a measurement also allows us to directly compare one patient’s values to another patient’s, to a group of healthy people, or even to that same patient’s value as they (hopefully) change through treatment.

Further quantifying the frequency of words or phrases like “abuse” and “I’m not crazy” along with the length and structure of each phrase allows us to smelt this behavioral ore into usable steel.

According to Daniel Fox, a graduate student studying linguistics at the University of Buffalo (SUNY), the Natural Language Toolkit is a popular tool that takes un-punctuated, transcribed speech, and annotates it with additional information based on word choice and order. One of these annotations is known as parsing.

A speech parser takes “John kicks the ball”, and identifies “John” as the subject noun, “ball” as the object noun, “goes” as the verb, and “the” as an article. The parser then groups words into phrases based on how closely they relate to one another. For example “the” and “ball” would be grouped into an object noun phrase: “the ball.”

By parsing language this way, the phrase “John kicks the ball” is tagged with additional layers of information that tell you how, on average, English-speakers group ideas. This additional information (a kind of meta-data) also allows you to test whether an individual speaker uses longer noun phrases or expresses ideas in a more complex way than other speakers—the beginnings of a biomarker.

“Shakespeare has certain patterns in the way he speaks,” Daniel Fox explained, “so you could identify Shakespeare both as an English speaker and as Shakespeare.” A parser would understand “a rose” as a noun phrase but the particular ordering of “a rose” within the larger sentence “a rose by any other name would smell as sweet” may be a phrasing that betrays Shakespeare by the word-groupings he prefers, by the way he likes to structure his ideas.

Natural language processing could also identify people who are at risk for specific types of mental illnesses.

In 2015, a group of scientists led by researchers at Columbia University reported they had used automated speech analysis to measure patterns that could predict later onset psychosis.

The group interviewed thirty-four adolescents who had developed early signs of schizophrenia. During a one-hour, open-ended interview participants described changes they had experienced and the impact of these changes, what had been helpful or unhelpful for them, and their expectations for the future.

The paper in the journal Schizophrenia reported that a combination of phrase length, use of determiners (https://en.wikipedia.org/wiki/Determiner), and how well one phrase segued into another successfully predicted the five adolescents that progressed to psychosis with an accuracy of 100%.

And this was based on open-ended verbiage that, until it was measured and parsed, probably seemed rather boring and useless. The act of measuring allowed researchers to organize and decode an underlying signal within natural speech.

This signal reflects a brain-product, the way our language networks are organized and function to produce speech. Having this additional window into the brain intrigued me, and made me wonder how natural speech processing would have parsed Jennifer’s cry for help and whether this measure in combination with some other behavioral measures like movement could have guided her treatment decisions by informing us more about the underlying etiology of her illness.

As I thumbed through my clinical sketches I realized how comparatively useless they were—the true measure of a woman requires more sophisticated tools.