How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
Jerry Huang

TL;DR
This paper evaluates the ability of various long-sequence neural network architectures, including state-space and recurrent models, to handle extended contexts, revealing practical limitations despite theoretical promises.
Contribution
It provides an empirical comparison of different architectural inductive biases for long-sequence modeling, highlighting their inconsistent extrapolation capabilities and practical challenges.
Findings
Recurrent models face similar issues as long-context LLMs.
Theoretical advantages of certain models do not always translate to practical performance.
Different inductive biases show inconsistent long-term extrapolation abilities.
Abstract
Long sequences occur in abundance within real-world scenarios, hence properly modelling them opens numerous down-stream use-cases. Deep neural networks, however, have often struggled with these for a variety of reasons. Recent advances, both in system engineering as well as model design, have enabled the scaling up of model that are purported to support extended context length. In particular, the state-space and linear recurrent neural network families of models hypothetically can entend to infinite sequence lenth. However, is this too good to be true? We conduct an evaluation to show that while such claims may be sound theoretically, there remain large practical gaps that are empirically observed. In particular, recurrent models still suffer in the same settings as long-context LLMs with attention. We further show that different inductive biases have inconsistent extrapolation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Online Learning and Analytics · Intelligent Tutoring Systems and Adaptive Learning
