Revenge of the Fallen? Recurrent Models Match Transformers at Predicting   Human Language Comprehension Metrics

James A. Michaelov; Catherine Arnett; Benjamin K. Bergen

arXiv:2404.19178·cs.CL·August 27, 2024·2 cites

Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics

James A. Michaelov, Catherine Arnett, Benjamin K. Bergen

PDF

Open Access 1 Repo

TL;DR

Recent recurrent models like RWKV and Mamba now match or surpass transformers in predicting human language comprehension, challenging the notion that transformers are uniquely suited for this task.

Contribution

This paper demonstrates that modern recurrent models can perform on par with or better than transformers in modeling human language comprehension, highlighting the importance of architectural diversity.

Findings

01

Recurrent models match transformer performance in comprehension metrics.

02

Recurrent models can outperform transformers at certain scales.

03

Transformers are not uniquely effective for modeling human language comprehension.

Abstract

Transformers have generally supplanted recurrent neural networks as the dominant architecture for both natural language processing tasks and for modelling the effect of predictability on online human language comprehension. However, two recently developed recurrent model architectures, RWKV and Mamba, appear to perform natural language tasks comparably to or better than transformers of equivalent scale. In this paper, we show that contemporary recurrent models are now also able to match - and in some cases, exceed - the performance of comparably sized transformers at modeling online human language comprehension. This suggests that transformer language models are not uniquely suited to this task, and opens up new directions for debates about the extent to which architectural features of language models make them better or worse models of human language comprehension.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jmichaelov/recurrent-vs-transformer-modeling
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques