February 19, 2017

Language and Error-Correcting Codes

Contemplating communication with Claude Shannon

Claude Shannon's unicycle on display at the Musée des Arts et Métiers in Paris.

Evelyn Lamb

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

Last week I went to the Musée des Arts et Métiers in Paris, a museum that features gears, engines, a replica of the original Foucault pendulum, and mechanical contraptions of all kinds. While I was there I checked out the temporary exhibit on Claude Shannon, “le magicien des codes,” as he is described there. Shannon was an American mathematician and computer scientist known as the father of information theory. Contemplating an exhibit about communication and codes felt personal to me. Living in Paris as a French language learner, I walk around every day laboriously deciphering messages that pose no difficulty to French ten-year-olds.

One of the portions of the exhibit is an interactive video display illustrating the general idea of an error-correcting code and why you would want to use one. (I wrote about the idea at more length in my July post about high-dimensional sphere packing.) It shows two people trying to communicate a message: “Bonjour!” As the message glides across the screen between them, video game lightning bolts come down, altering some of the letters. By the time the message reaches the second person, it reads something like “Bxtj!ur1.” The recipient is sad and confused. End scene.

Then the museum-goer has the opportunity to help the poor characters communicate by changing the number of times the message is transmitted. If you transmit two letters for every one letter of the message—that is “BBoonnjjoouurr!!”—you can tell more easily whether the message has been altered in transit. It’s unlikely for two letters that have both been altered end up the same again, so if letters 1 and 2, for instance, match, then it’s pretty likely they came through correctly. If you triple the message: “BBBooonnnjjjooouuurrr!!!” you have a pretty good chance of being able to correct a mistake. If two of the three letters match, you can guess that they are right. If two out of three of the letters match for every letter of the word, your message made it through intact. That means a full 1/3 of the letters could be corrupted, and you’d still be able to communicate.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The exhibit allows you to see how increasing the number of repetitions makes it more likely for the correct message to get through, but it also illustrates the other side of the equation: the more times you repeat the message, the longer it takes to get through. When I increased the repetition to 5 times, the message was transmitted correctly, but it took an excruciatingly long time.

I must admit, when I played with the exhibit I got a little frustrated with the hapless stick figures who were trying desperately to talk with each other. At three repetitions, the message usually got through, but sometimes one letter ended up unusable, and they wouldn’t take a hint. I mean, if you got the message “Bonj4ur!” I think you’d manage to figure out that the person was trying to greet you, right? Search engines and autocorrect know that I really mean “embarrassed” when I write “embarassed.” Why couldn’t the characters do better? Those silly stick figures were stubbornly resisting the error-correction we have naturally built into language.

The irony of my frustration with those stick figures was not lost on me. I am working on my French, and I can conduct some basic transactions in French—I almost broke into song when I managed to buy the right number of the right stamps at the post office—but I am often left staring blankly at someone while my brain tries to process what they’ve said. By the time I make sense of it, and well before I’ve managed to painstakingly craft a reply, they’ve switched to English or written me off entirely. When I’m not cursing my own ignorance, I spend a lot of time here thinking about the nature of communication in general.

One of the biggest obstacles to my communication in French is that at my current level, French is not error-correcting the way English is. If there is a PA announcement on a bus or train, I will understand it if it is “Prochain arrêt,” followed by the name of a stop, but anything else is likely to pass by me entirely. Crackly English broadcasts are generally not a problem, but I can’t error-correct French the way I can English, so with even a small loss of sound quality, the messages become incomprehensible.

In one-on-one speech with other humans, I still have trouble hearing and noticing the sound differences that convey niceties such as number to me. “Le garçon” and “les garçons” are hard for me to distinguish on the fly. Luckily the verb ending encodes that information as well. On words where there is a clear sound difference between third person singular and plural forms of the verb, I can use that to compensate for the weaker aspects of my listening. But not every verb has a sound change between those forms, and even when they do, I don’t always hear it correctly. I’m much more likely than a native French speaker to confuse words that sound similar, even if one of them doesn’t make sense at all in context. I miss a lot of words and context as it is—it’s not so easy for me to pick up on the fact that someone is talking about her hair (cheveux), not her horses (chevaux).

Most surprisingly, I sometimes have the opposite problem and a Francophone’s built-in error-correction sabotages my attempt to get my point across. One time I tried to say that my spouse was working (travailler) in a certain part of the city. They explained that “travel” in French was actually “voyage.” Another time I tried to tell someone I had visited Caen, a small town in Normandy, and they corrected my pronunciation of Cannes, the famous film destination along the Riviera. City names in general are treacherous. I’ve had the same problem with Lille vs. Lyon, and a few years ago a German could not understand that I wanted to go to Köln until I called it by it’s English (and French) name, Cologne. Kind attempts to compensate for my bad pronunciation and halting sentence construction sometimes end up introducing as many errors as they correct.

There are linguists who study redundancy in different aspects of language and computer scientists who relating natural language’s error-correcting properties to machine learning and translation. But I experience that on an intuitive level—I haven’t undertaken a rigorous study of my own French error-correction deficiencies. I have verb conjugations to practice.