August 6, 2013

Confound It! The Mathematics of Learning Language

“Eighteen nouns and three verbs, they’re in her fingers now. I need only time to push one of them into her mind. One, and everything under the sun will follow.” — Annie Sullivan, The Miracle Worker Over one hundred years ago, a baby girl fell ill with what may have been scarlet fever (or meningitis), [...]

By Jennifer Ouellette

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

"Eighteen nouns and three verbs, they're in her fingers now. I need only time to push one of them into her mind. One, and everything under the sun will follow." -- Annie Sullivan, The Miracle Worker

Over one hundred years ago, a baby girl fell ill with what may have been scarlet fever (or meningitis), leaving her deaf and blind -- and largely mute, since she suffered the illness at 19 months, although by the time she turned eight, Helen Keller had worked out a system of 60-odd signs to help her communicate basic needs to family members. But she had very little understanding of language, as governess Anne Sullivan discovered when she joined the Keller household in March 1887. Sullivan sought to rectify this, in addition to teaching the child basic manners and dealing with her often-violent tantrums.

Sullivan's method involved spelling the words for various objects into Helen's hand: "d-o-l-l," for instance, when she handed the child a doll at their first meeting. But Helen didn't understand the letters made a word, and that word described a real-world object -- i.e., everything in the world had a name.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

It took several months, but one fateful day, when Helen had misbehaved at the dinner table and run outside to the yard, Sullivan followed her to the water pump. As the water ran over Helen's hand, Sullivan spelled out "w-a-t-e-r" in the palm of the other hand -- and something clicked in the young girl's mind. She made the connection, eagerly rushing about, pointing to various objects so Sullivan could spell out the corresponding words in her palm. It changed her life. Helen Keller went on to become fully literate, a published author, and a staunch advocate for the deaf and blind.

That turning point -- the essence of language -- was memorably depicted in The Miracle Worker, first produced as a Broadway play in 1959, later becoming a 1962 film starring Patty Duke as Sullivan (

she won an Oscar for the role Patty Duke played Helen Keller. Anne Bancroft played Annie Sullivan. Both won Oscars for their roles.):

I thought of Keller's story while perusing a recent paper in Physical Review Letters by physicists in Germany and Scotland on new simulations of language acquisition demonstrating that by assuming the absence of synonyms, it's possible to pick up the vocabulary of a new language much more quickly. Yes, there is a math and physics component to how we learn new words, and it's a lively field of study.

There are two basic mechanisms involved in picking up new words. First there is the heuristic approach, in which the child is able to infer the meaning of a word when it is spoken by relying on external cues, such as following the gaze of the speaker -- or having the object described by the word pointed out to them, as Sullivan did with Keller. But there may be more than one possible meaning for a given word, even with those cues, a kind of residual uncertainty.

That's where the second mechanism, cross-sectional learning, comes into play. Usually this uncertainty can be reduced by comparing many difference instances of a given word being used in different contexts. If there's one meaning that seems to remain plausible over several such instances, that increases the likelihood of this being the correct meaning. In other words, you strengthen the associations between words and meanings when they co-occur repeatedly. That's why this is such a useful tool for neural network models of learning, for instance, although the authors of this particular paper (Richard Blythe, Rainer Reisenauer, and Kenny Smith) point out that it's also a useful error-correction process to reconstruct associations from noisy data.

Children lean around ten words a day, on average, amassing over 60,000 words in their personal lexicon by the time they reach 18, and they employ lots of different strategies to do so -- including cross-situational learning for those times when meaning is ambiguous.

Case in point: If a child hears the word "cup" and there is just a cup in front of that child, the meaning is clear, much like it was clear to Helen Keller that "water" correlated with the cool liquid gushing out of the pump and over her hand. But what if there is both a cup and ball? How will the child know which is the correct meaning?

S/he will probably remember hearing "cup" at a different time, when the cup was present with another object, like a doll. The only common object in those two instances was the cup, so "cup" must pertain to that object. That's cross-situational learning. Combine this with an assumption that there are no synonyms -- i.e., mutual exclusivity -- and even more uncertainty can be removed. The meaning becomes crystal clear.

"It's a boot-strapping technique, where you use information from previous learning [of words] to eliminate certain meetings," co-author Blythe (University of Edinburgh) told Physics Focus. He and his colleagues wanted to assess how effective the mutual exclusivity strategy is compared to other approaches to determine meaning, when one is dealing with hundreds of words.

To find out, they turned to nonequilibrium statistical physics, often used to model molecular interactions. Here, a word becomes analogous to a molecule, and the model shows how the probability distributions for given states (possible meanings of that word) evolve over time. The probability for "cup" being associated with the cup in the room starts off low, because there are many confounding variables, but over time, the system reaches equilibrium -- those confounders are gradually ruled out and ultimately the word is attached to a single meaning.

Blythe et al. essentially built a computer model, assuming a language with between 50 or 100 words, each linked with a typical frequency of use. Then they simulated a "learner," presented with a word and a series of "objects" -- again, simulated -- at least one of which represented the right meaning, although sometimes more than one could apply (i.e., the equivalent of synonyms). The "learner" would compare many such events -- cross-situational learning in action. The researchers were also able to track how long it took the "learner" to acquire a full lexicon of up to 60,000 words.

Then they compared two separate cases with different conditions. The first did not employ mutual exclusivity -- i.e., synonyms were not excluded -- so once a word was "learned", it was not automatically removed from the full list of possible confounders. The result: using just this strategy, we would spend an entire lifetime acquiring a basic 60,000 word lexicon. But when they ran the simulation in the second case, in which synonyms were not allowed, the time to learn new words dropped significantly. Clearly, mutual exclusivity is a very effective strategy when it comes to language acquisition, and statistical physics "can contribute much to the understanding of how children learn the meaning of words," the authors conclude.

It's kind of ingenious, really, although it's important to bear in mind that this is a sharply focused, carefully designed simulation of just one aspect of a hugely complicated cognitive process that involves a lot more than nonequilibrium statistical physics. But Indiana University cognitive scientist Linda Smith did tell Physics Focus that mutual exclusivity is pretty key: "Competition is how the brain works -- in all domains, at all levels." So there.

References:

Bloom, P. How Children Learn the Meanings of Words. Cambridge, MA: MIT Press, 2000.

Blythe, Richard; Reisenauer, Rainer; Smith, Kenny. (2013) "Stochastic dynamics of lexicon learning in an uncertain and nonuniform world," Physical Review Letters 110: 258701.

Markman, E.M. and Wachtel, G.F. (1988) "Children's use of mutual exclusivity to constrain the meaning of words," Cognitive Psychology 20: 121.

Medina, T.N. et al. (2011) "How words can and cannot be learned by observation," Proceedings of the National Academy of Sciences 108: 9014.

Meier, Richard (1991). "Language Acquisition by Deaf Children," American Scientist 79 (1): 60–70.