TL;DR
This study compares incremental language processing in humans and neural models using reaction time data across various syntactic phenomena, revealing models' limitations in matching human sensitivity to syntactic violations.
Contribution
It introduces a novel online reaction time paradigm and provides a large-scale comparison showing models' underestimation of processing difficulty differences.
Findings
Models match humans in direction of difficulty but underestimate magnitude differences.
Models fail to predict longer reaction times in syntactic violation cases.
Humans and models show similar increased difficulty in ungrammatical regions.
Abstract
We present a targeted, scaled-up comparison of incremental processing in humans and neural language models by collecting by-word reaction time data for sixteen different syntactic test suites across a range of structural phenomena. Human reaction time data comes from a novel online experimental paradigm called the Interpolated Maze task. We compare human reaction times to by-word probabilities for four contemporary language models, with different architectures and trained on a range of data set sizes. We find that across many phenomena, both humans and language models show increased processing difficulty in ungrammatical sentence regions with human and model `accuracy' scores (a la Marvin and Linzen(2018)) about equal. However, although language model outputs match humans in direction, we show that models systematically under-predict the difference in magnitude of incremental processing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
