November 25, 2016

Chomsky's Theory of Language Learning Dead? Not So Fast...

A recent claim that the MIT linguist's "theory of language learning" has been refuted is wrong on many levels

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

To read the headline accompanying Ibbotson and Tomasello’s (I&T) recent article in Scientific American, “Evidence Rebuts Chomsky’s Theory of Language Learning,” one might expect to find evidence that runs counter to a theory of language learning offered by Noam Chomsky. This expectation would be frustrated, however. Part of the reason for this is that Chomsky has never offered a theory of language acquisition. Rather Chomsky’s program has been to define the initial conditions that define the space of possible human languages and thus make language acquisition possible. The other part of the reason is that the little evidence I&T bring to bear is entirely orthogonal to any proposals about language acquisition from Generative Linguistics. The authors’ fundamental misunderstanding of the goals of Chomskyan linguistics explains the emptiness of their claims.

In the 1950’s and 60’s, Chomsky developed a framework for studying the human language faculty. This framework included three parts: (1) the construction of formally explicit models of linguistic knowledge, (2) the search for general principles that could limn the space of possible grammars and (3) the methodological assumption that grammatical knowledge and grammatical usage should be treated as distinct. The program stated as its ultimate explanatory goal the aim of deploying general grammatical principles as one critical component of a theory of language acquisition. The grammatical principles would, in combination with the experience of learners and other faculties of mind, lead to the growth of grammatical knowledge. This partitioning of contributors to language acquisition is analogous to explaining an organism’s growth as a complex interaction between its genetic structure, the external environment, and other internal properties. Just as no biologist would think that a theory of genetic structure is equivalent to a theory of biological development, no linguist (and certainly not Chomsky) would think that the theory of grammatical structure is equivalent to a theory of language acquisition.

The idea that a theory of possible human languages could contribute to an explanation of language acquisition was motivated in part by arguments from the poverty of the stimulus, the observation that what we come to know about our languages is far richer than what can be directly observed in the sentences typical children are exposed to. To get a sense of how such arguments run, consider the following:

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

(1) a. Val is a good volleyball player and Al is too b. Val is a better volleyball player than Al is

In both of these examples, there is an unpronounced predicate in the second clause, interpreted as equivalent to the predicate in the first clause (i.e., a good value-ball player). Let’s call (1a) coordinate ellipsis and (1b) comparative ellipsis. And for clarity, let’s write in the missing predicates like this, where the strikethrough font indicates interpretation without pronunciation:

(2) a. Val is a good volleyball player and Al is [a good volleyball player] too

b. Val is a better volleyball player than Al is [a good volleyball player]

These kinds of elliptical constructions are common in speech to children. We can also observe that the relation between the unpronounced predicate and its antecedent can be stretched across multiple clauses in both cases, though child learners do not hear many multi-clause sentences of this type.

(3) a. Val is a good volleyball player and I think that Al is [a good volleyball player] too

b. Val is a better volleyball player than I think that Al is [a good volleyball player]

c. Val is a good volleyball player and I heard you say that Al is [a good volleyball player] too

d. Val is a better volleyball player than I heard you say that Al is [a good volleyball player]

However, when the embedded clause is a relative clause, then the two kinds of ellipsis differ. The coordinate ellipsis still produces a possible sentence of English, but the comparative ellipsis does not (the * indicates that the sentence is not English):

(4) a. Valentine is a good value-ball player and I heard a rumor that Alexander is [a good volleyball player] too

b. * Valentine is a better value-ball player than I heard a rumor that Alexander is [a good volleyball player]

The fact to be explained here is why the child learner, when building representations for coordinate and comparative ellipsis, doesn’t treat the silent predicate in the same way in the two cases. Both can be interpreted as identical to the main clause predicate in all of the sentences in (1) and (3). However, if the elided predicate is inside a relative clause, then it can be interpreted as identical to the main clause predicate in coordinate ellipsis but not in comparative ellipsis. It is an analogy that could be drawn but apparently isn’t. Learners of English are not exposed to sentences like (4a) or (4b), but somehow we all come to recognize that (4a) is a possible sentence of English and (4b) is not. How?

The Chomskyan answer is a partial answer. It says that comparative constructions like those in (1b) have a structural feature in common with questions. To see why this has the effect of making (4b) impossible, let us consider how questions are structured. Constituent questions, like (5), relate a phrase at the beginning of a sentence to a verb later in the sentence:

(5) What did Ellen take?

The verb take is transitive, it requires a direct object, making (6a) a possible sentence and (6b) not:

(6) a. Ellen took a picture

b. * Ellen took

In (5), the direct object is the word “what,” which occurs at the front of the sentence, but fills the same role as the phrase a picture in (6a).

This dependency can also be stretched across multiple clauses:

(7) a. What do you think that Ellen took? 

b. What did you hear Tonia say that Ellen took?

But, if the verb is inside a relative clause, then this dependency cannot be formed: (8) * What did you hear a rumor that Ellen took?

Sentences like (7b) and (8) both fall outside the experience of typical child language learners, but we all come to recognize that (7 a/b) are possible sentences but that (8) is not

If we build on these observations in English by examining many more dependencies and many more languages, we find that that there are (at least) two kinds of dependencies in human languages. One kind can be formed into relative clauses and one kind cannot. Given this classification, we might propose that the classes of dependencies are built into the learner’s language faculty. This proposal changes the nature of the language learning problem. The learner’s job is not to discover every property of the language being acquired. Rather, the learner’s job (in this domain) is to discover which kinds of dependencies that are exhibited in the language fall into which class. Having classified the dependences, then the learner will know that certain of them (like coordinate ellipsis) will be possible into relative clauses and others (like comparative ellipsis and constituent questions) are not. The learner does not have to figure out for each dependency whether it is possible into a relative clause. Instead, she has to figure out which class a dependency falls into. From that classification, the interaction with relativization follows from the principles of Universal Grammar that defined the classes to begin with.

I&T claim that this is all wrong and that evidence has shown it to be the case. But the evidence that they raise has to do with the simplest most easily observed characteristics of language, like whether a verb requires a direct object, facts that it is simple to build a learning theory of because they are so abundant in the learner’s experience. The Chomskyan view allows for notions of observation, analogy making, and distributional analysis to contribute to an explanation of how they are acquired, just like the usage-based theory espoused by I&T.

But when it comes to highly abstract and cross-linguistically stable properties like the classification of dependencies, the usage-based theorists have been conspicuously silent. This silence is predictable from the shape of the theory. As anyone who has ever studied learning and generalization from a precise, formal perspective knows, theories of distributional analysis, analogy-making and generalization begin with a statement of the class of observable features and a statement of the class of projectible predicates that define the dimensions along which generalizations can be made. Since the usage-based theories offers no specification of the possible dimensions of generalization, it can by design offer no explanation of how learners generalize, and more importantly, no explanation of the generalizations that are consistent with the data but that learners fail to take.

Finally, the usage-based theorists tell us that the methodological principle that grammatical knowledge and language usage are partially independent is an incoherent idea that offers little to an explanation of language acquisition. Thus, they argue that it should be rejected. But, they offer no explanation of how this incoherence arises nor of how the explanation of linguistic behavior can succeed without such a distinction. To take a trivial case, I know how to spell the word “language.” However, sometimes when I am typing quickly, it comes out as “langauge,” with the letters ‘a’ and ‘u’ inverted. This fact about my typing can be explained by two factors (1) my correct representation of the spelling of the word and (2) the fact that my motor planning and action systems make it so that the sequence “g-u-a” requires alternating between my left and right hands and so the pressure to type as fast as possible makes it hard to get that alternating pattern exactly right and sometimes I fail to get the ‘u’ from my right hand to fall in between the ‘g-a’ sequence of my left. Does this mean that I don’t know the correct spelling or that I have represented the spelling of language as 80% one way and 20% the other? Even I&T would not think so. But then why is my linguistic capacity any different?

Why shouldn’t the process of speaking, which involves the integration of my knowledge of how sentences are constructed and how words are pronounced with conceptual knowledge, systems of memory, predictive processes, etc, not be explained similarly? Indeed, recognizing this distinction makes it possible to diagnose particular facts as belonging to my knowledge of grammar vs. the processing systems that make use of that knowledge.

To be slightly more specific, consider the phenomenon of agreement attraction:

(9) The key to the cabinets is/#are on the table

The phenomenon is that people occasionally produce “are” and not “is” in sentences like these (around 8% of the time in experimental production tasks, according to Kay Bock) and they even fail to notice the oddness of “are” in speeded acceptability judgment tasks. Why does this happen? Well, psycholinguists have argued that this has something to do with the way parts of sentences are stored and reaccessed in working memory during sentence comprehension. That is, using an independently understood model of working memory and applying it to sentence comprehension these authors explained the kinds of agreement errors that English speakers do and do not notice. So, performance masks competence in some cases. And it does so in a way that saves us from having to complicate our grammatical theory of subject-verb agreement by allowing us to apportion explanation between the grammatical theory and the processing theory. Are such explanations somehow less scientific that ones that fail to apportion explanation across domains? Obviously not.

I&T also claim that this distinction between knowledge and use (competence and performance in Chomsky’s terms) is pernicious and undercuts the falsifiability of proposals about language acquisition. But why? Consider the following. All language users understand sentences incrementally - they build their understanding as they hear sentences, rather than waiting for them to end. This sometimes leads to trouble. For example, consider the following

(10) Put the frog on the napkin in the box

When we hear such sentences, our initial interpretation of the phrase “on the napkin” is as the location where the speaker wants the frog to be put. As the sentence continues, we revise our interpretation so that “on the napkin” is instead understood as a modifier of the phrase “the frog,” equivalent to “the frog that is on the napkin.” This revision process can be seen in eye- tracking studies of sentence understanding. Children show difficulty revising this initial interpretation, an effect that can be seen both in their eye-movements and in their actions, where they sometimes put the frog on the napkin. One proposal is that this difficulty in revision is explained by children having less developed executive control mechanisms, the mechanisms that allow for the inhibition of prepotent or partially executed responses. Indeed, patients with damage to the brain areas that control this kind of executive function show the same pattern.

This difficulty can be generalized in a way that explains children’s errors in language acquisition. For example, Akira Omaki examined English- and Japanese-learning 4-year-olds’ interpretations of sentences like:

Where did Lizzie tell someone that she was going to catch butterflies?

These sentences have a global ambiguity in that the word “where” could be associated with the main verb (tell) or the embedded verb (catch). Now, if children are incremental parsers and if they have difficulty revising their initial parsing decisions, then we predict that English children should show a very strong bias for the main verb interpretation, since that interpretation would be the first one an incremental parser would access. And, Japanese children should show a very strong bias for the embedded interpretation, since the order of verbs would be reversed in that language. Indeed, that is precisely what Omaki found, suggesting that independently understood properties of the performance systems could explain children’s behavior. Thus, understanding performance systems allows us to explain a pattern in language acquisition across language that receives no general explanation otherwise.

In sum, I&T announce the demise of Chomskyan linguistics, but fail to engage seriously with any of the core ideas of the field. I&T, along with the research they cite, avoid addressing any of central claims of Chomskyan linguistics. They do not show that there is a logical problem with the framework laid out by Chomsky in the 1960s; nor do they show that there are any fundamental empirical problems with this framework. The failure to distinguish the key ideas of the framework from specific proposals within it, along with the naive understanding of how language structure, language acquisition and language use relate to each other leads to an article that is deeply confused about the positions that it claims to have overthrown and the force of the “evidence” that it purports to have presented.