Why ChatGPT doesn't matter to syntacticians (or: what it would take to make a syntactician care about Large Language Models)

Once again, the death knell has been rung for generative linguistics[1]. I think that is premature: there is very little engagement with what researchers in the field actually try to achieve before pronouncing it dead (or passé). Throughout the history of science, people didn’t just dismiss a theory, even when aspects of it were known to be false. The theories were supplanted by competing explanations that increased empirical coverage while also providing deeper insight into already-documented phenomena.[2] Sometimes, this requires many revisions in the original theory, sometimes few. Sometimes it leads to radical revisions at the heart of the theory. To me, it doesn’t seem like people are after the same phenomena in the generative-grammar-LLM dispute. In all of the excitement, people seem to have missed the sort of basic facts that generative linguistics tries to explain, so I would like to walk through some data that is of interest to syntacticians, and why.[3] I suspect a lot of this will be familiar to other linguists working in this tradition, but I hope it'll be of some help to philosophers and cognitive scientists (especially because a good deal of the discussion can be polemic in a very unhelpful way).

The main point I want to make is that syntacticians don’t just say which strings are found in a language and which are not (strings being a sequence of letters/words). Syntacticians try to describe non-trivial constraints on the pairings of strings with interpretations, or ‘constrained ambiguity’, as it is sometimes called. This refers to the way one string like The journalist interviewed the band on the stage can be paired with two interpretations: the band can be on the stage or the interview can take place on the stage. Until other approaches offer something about this fact about language, illustrated more carefully below, they are not competitors to generative grammar, and hence, cannot supercede it as a theory.

Syntacticians are interested in what a language-user knows about a sentence’s abstract structure, and how it is linked to some aspect of the sentence’s pronunciation and some aspect of the sentence’s meaning.[4] In acquiring a language, then, we assume children get a signal-meaning-pairing-machine (SMPM) in their heads, not merely a string-generating machine. (I am using signal to mean any sort of sound or gesture, to be neutral between signed and spoken languages.) This seems pretty indubitable to me: people who know a language don't just know what an acceptable signal is, they also know various things about its meaning. Minimally, I know that cat doesn't mean dog, and, so, I have a cat doesn't mean the same thing as I have a dog.

So, if the theory of syntax is a description of the SMPM, you have to be able to state what signals are, and what meanings are. We have a pretty good handle on the first one, thanks to the phonologists and phoneticians, and at least some grasp of the second, thanks to the logicians and the semanticists. Idealizing a little, the SMPM takes phonological representations and gives you logical forms (and vice versa). Yes, this makes big assumptions about what meanings are and what signals are, but those assumptions are quite independent of generative syntax.

So, in describing SMPMs, we are not just talking about the generability of strings anymore. It is important to distinguish between ‘weak’ and ‘strong' generation. This is a distinction that Chomsky made in 1965. Two grammars are equivalent in their weak generative capacity iff they generate all and only the same strings. They are equivalent in strong generative capacity iff they generate the same 'structural descriptions'. For our purposes, let's just say two grammars are equivalent in their strong generative capacity iff they describe the same SMPM. That's more work than just getting the right strings. And syntacticians want to design mathematically-defined SMPMs that pair a string with the same interpretations that a human would.

LLMs, on the other hand, just assign probabilities to strings. This actually might stand a chance of being weakly equivalent to a generative grammar (e.g., it assigns p>0 to all grammatical sentences, and p=0 to all others). It is possible to give the string in (1) a high chance of being accepted by a native speaker of English. But even then, this isn't enough to supplant a theory of syntax, because all this machine does is give you the strings of the language. To compete with a theory of syntax, you neeed to describe an SMPM: you must include what a speaker knows about a string's interpretation(s). So, what does a typical, English-speaking adult know about the string in (1)?

(1)    How did Robert ask whether the band played the song?

For one thing, it is a question. Its user also assumes there was an event of asking about a band’s playing of a song, and Robert did that asking. But from a syntactician’s point of view, a number of interesting facts pop out. In English, you can often pronounce question-words in at least two places and get a similar meaning. 

(2)    Why did you tell me this?

(3)    You told me this why?

In both, why is associated with the verb tell/told (the difference between the two isn't important here). All this should be something the SMPM tells us. In (1), it's more interesting, as there are two verbs that the question-word, how, could be associated with: ask and play. There is no a priori reason to doubt that, and we could easily invent an artificial language that worked like that. But looking at it empirically, in natural language, we don’t get total freedom with interpretation. English speakers know that how gets associated only with the first verb, ask, and not with the second verb, play. A speaker of English knows what a good answer is and what a bad answer is, in the context of (4), .

(4)    A concert is being televised. Robert is watching it. The band is playing a song very quietly. He can’t hear the band but is unsure whether the volume on the TV is low or whether the band is just playing softly. So, he loudly asks another watcher, “Is the band playing quietly?”.

Having heard this story, if you are asked the question in (1): How did Robert ask whether the band was playing the song?, you know that the answer ‘quietly’, is false. Given the context, the answer to (1) is ‘loudly’, not ‘quietly’. So, a syntactician concludes that how can be associated with ask, but not play. Contrast this with (5).

(5)    Why did Robert think that the band played the song?

In the following context, there are two good answers to (5).

(6)    Robert is seeing the band play the song.

He remembers that they play this song at concerts because it is popular with their fans.

Both the answers because he is seeing it happen and because it is popular with their fans are good answers to the question in (5). That is, (5) is ambiguous in a way that (1) is not: the question-word why can be associated with both think as well as play. A syntactic analysis, then, requires us to come up with some explanation for why (5) is ambiguous and (1) is not, at least for a lot of speakers.[5] Notice that we cannot come up with an explanation just in terms of word order of the parts of speech, as those are the same: question-aux-subj-verb-conjunction-det-subj-verb-det-obj. Something more abstract is called for. So, this all has to be included in our design of the SMPM: (1) is a string that gets only one meaning, (5) is a string that gets only two meanings.

The thing that makes (1) interesting is that it is not ambiguous in a way that it could have been. It lacks a meaning that it could plausibly have had: the meaning where ‘quietly’ is a good answer. Grammaticality is not string-grammaticality, but whether this signal can get this meaning, each independently given. Trying to find interesting and unexpected constraints on that pairing is the bread and butter of syntacticians. The relevant constraints here don't seem to be 'high' or 'low' probability of a given interpretation being assigned in a context either.

LLMs do not seem to allow for even stating a fact about n-way ambiguity. Under some theoretical elaboration of ‘literal’ meaning[6], it is an important fact to note that (1) has one literal meaning – i.e., is 1-way ambiguous, and (5) has two literal meanings, i.e., is 2-way ambiguous. Now, if LLMs can begin to state facts like this – effectively, facts about the logical form of the sentence: what stands in a meaning-relation to what – then they would be competing models to generative grammars. What such an LLM would look like is, I think, an open question.

Without being able to state the structurally relevant facts about meanings – things that logicians have called attention to, like scope – LLMs are not really competing with theories in syntax. This extends to more familiar things, like the interpretation of pronouns. It is an important part of syntax to explain why (7) and (8) are in no context synonymous.

(7)    John asked Bill to take a photograph of him.

(8)    John asked Bill to take a photograph of himself.

The SMPM shouldn't pair the strings in (7) and (8) with the same meaning. Syntacticians say what's relevant is something like a clause boundary, and saying that reflexives like himself need to be interpreted as coreferential with something in that boundary, and pronouns like him need something outside it. So, the SMPM will be designed such that a reflexive item can only be paired with interpretations where it is coreferential with something in its clause, and a pronoun can't be interpreted that way. Incidentally, in my testing, ChatGPT 3.5 did not give the same interpretations to these that an English speaker would. If rules are dependent on notions like 'clause boundary', it is going to be hard for an LLM to learn them, unless it implements something that is beginning to look like a generative grammar. I think it is particularly striking that the above contrast is extremely simple, and ChatGPT still failed, despite being so good at some very fancy tasks. This is exactly the sort of thing that differentiates string-generators from SMPMs.

The data I’ve drawn attention to has been particularly clean – I’ve tried to keep the changes as minimal as possible. With the string in (1), we tested a minimal pair: the same string is compatible with one meaning but not another. In (7) and (8), we test the difference between a pronoun and a reflexive, changing nothing else in the string. These tests alone lead to interesting conclusions. Of course, introducing more data will make things messier, and it is part of our job as syntacticians to explain that too, e.g., why can the sentences in (9) and (10) be synonyms? 

(9)    John pulled the blanket up over himself.

(10)  John pulled the blanket up over him.

Whatever explanation we give to the contrast between (7) and (8) should be able to account for that. This is all taken to be part of what a child acquires, in an English-speaking environment. As I said earlier, to describe these contrasts, we have to assume there is something like an SMPM with the relevant constraints. But Large Language Models don't describe an SMPM. They only generate strings. Syntacticians are interested in more: they are interested in constrained ambiguities, like how (7) and (8) can't be synonyms – that's the game. If you want to put an end to generative linguistics, you need competing explanations for these kinds of facts. Having a machine that generates English-looking strings is not enough. It must generate good string-to-meaning pairings. If you aren’t playing this game, you’re not competing with generative syntax.

P.S. I haven't even said anything about cross-linguistic work, which has been burgeoning since the 1980s in generative syntax. Even if someone manages to come up with an LLM-SMPM for English, it must also be extensible to other languages.


[1] I refrain from saying ‘Chomskyan’ because I think 'generative' is more transparent about the content of the theory, and encompasses more approaches than Chomsky's own work in the Government-and-Binding/Minimalist tradition (e.g., Lexical-Functional Grammar, Head-driven Phrase-Structure Grammar, Tree-Adjoining Grammar).

[2] Arguably, the Minimalist Program did not do this, and leaves many generalizations uncovered between 1980 and 1995 not only unexplained, but unstateable.

[3] Incidentally, I am saying nothing new here. A lot of this has been said already in other work, but this is might be seen as a very short summary of that.

[4] There are a few things I will be taking as given: we are investigating some aspect of the human mind/brain that is involved in the learning of a language. Insofar as there are competing theories, they are competing theories about a human being’s mental content. As far as I know, this isn’t controversial (anymore).

[5] Perhaps there is speaker-variation on this, but my intuitions are robust and so are those of the people I have bothered about it. I agree with many critics of generative grammar that this may not be enough to finish a theory with. But it is enough to start. Ideally, we could come up with an explanation for speakers who differ on this point too, but we start with explaining what we think is the rule, and only then the exceptions.

[6] There is a lot of work in the philosophy of language on trying to make sense of this. I actually find the word ‘literal’ to be unhelpful in most theoretical contexts, but in this relatively high-level piece, I am allowing myself some latitude.