Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics
James A. Michaelov, Catherine Arnett, Benjamin K. Bergen

TL;DR
Recent recurrent models like RWKV and Mamba now match or surpass transformers in predicting human language comprehension, challenging the notion that transformers are uniquely suited for this task.
Contribution
This paper demonstrates that modern recurrent models can perform on par with or better than transformers in modeling human language comprehension, highlighting the importance of architectural diversity.
Findings
Recurrent models match transformer performance in comprehension metrics.
Recurrent models can outperform transformers at certain scales.
Transformers are not uniquely effective for modeling human language comprehension.
Abstract
Transformers have generally supplanted recurrent neural networks as the dominant architecture for both natural language processing tasks and for modelling the effect of predictability on online human language comprehension. However, two recently developed recurrent model architectures, RWKV and Mamba, appear to perform natural language tasks comparably to or better than transformers of equivalent scale. In this paper, we show that contemporary recurrent models are now also able to match - and in some cases, exceed - the performance of comparably sized transformers at modeling online human language comprehension. This suggests that transformer language models are not uniquely suited to this task, and opens up new directions for debates about the extent to which architectural features of language models make them better or worse models of human language comprehension.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
