Why are language models less surprised than humans? Testing the Parse Multiplicity Mismatch Hypothesis
William Timkey, Brian Dillon, Tal Linzen

TL;DR
This study investigates why language models are less surprised than humans during sentence processing by testing if the number of simultaneous interpretations considered by models explains the difference.
Contribution
It introduces a systematic method to vary the number of simultaneous parses in language models and assesses its impact on predicting human reading times.
Findings
Reducing the number of active parses increases predicted garden path effects.
The increase in predicted effects is insufficient to match human data.
Differences in the number of parses do not fully explain the surprisal mismatch.
Abstract
Surprisal theory posits that the processing difficulty of a word is determined by its predictability in context, offering a potential link between human sentence processing and next-word predictions from language models. While language model (LM) surprisals successfully predict reading times in naturalistic text, they systematically underpredict the magnitude of difficulty observed in controlled studies of syntactic ambiguity, particularly in garden path sentences. This mismatch might arise from differences in the computational constraints between humans and LMs. Here we test one such hypothesis, specifically, that LMs may be able to simultaneously consider a greater number of distinct sentence interpretations at once, compared to humans. Using Recurrent Neural Network Grammars (RNNGs) with word-synchronous beam search, we systematically vary the number of simultaneous parses used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
