Just read twice: closing the recall gap for recurrent language models

Simran Arora; Aman Timalsina; Aaryan Singhal; Benjamin Spector; Sabri; Eyuboglu; Xinyi Zhao; Ashish Rao; Atri Rudra; and Christopher R\'e

arXiv:2407.05483·cs.CL·July 9, 2024·1 cites

Just read twice: closing the recall gap for recurrent language models

Simran Arora, Aman Timalsina, Aaryan Singhal, Benjamin Spector, Sabri, Eyuboglu, Xinyi Zhao, Ashish Rao, Atri Rudra, and Christopher R\'e

PDF

Open Access 1 Repo

TL;DR

This paper addresses the recall limitations of recurrent language models by analyzing information order effects and proposing methods like repeated prompts and non-causal attention to improve long-context understanding and efficiency.

Contribution

It formalizes the impact of data order on recurrent LMs' recall ability, linking it to set disjointness complexity, and introduces novel prompt techniques to mitigate order sensitivity and enhance performance.

Findings

01

Repeated prompts improve ICL performance by 11 points across models and tasks.

02

Non-causal prefix-linear attention achieves near-transformer quality with higher throughput.

03

Memory efficiency and long-context recall are significantly enhanced by proposed methods.

Abstract

Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key challenge for efficient LMs is selecting what information to store versus discard. In this work, we observe the order in which information is shown to the LM impacts the selection difficulty. To formalize this, we show that the hardness of information recall reduces to the hardness of a problem called set disjointness (SD), a quintessential problem in communication complexity that requires a streaming algorithm (e.g., recurrent model) to decide whether inputted sets are disjoint. We empirically and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HazyResearch/prefix-linear-attention
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training · Linear Layer · Multi-Head Attention · Attention Is All You Need · Softmax · Byte Pair Encoding · Layer Normalization · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer