Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining

Ruizhe Huang; Kexuan Zhang; Yihao Fang; Baifeng Yu

arXiv:2512.23862·cs.LG·January 1, 2026

Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining

Ruizhe Huang, Kexuan Zhang, Yihao Fang, Baifeng Yu

PDF

Open Access

TL;DR

This paper explores Infini-attention, a memory-augmented attention mechanism, in small-scale language models, demonstrating its potential to improve long-context understanding and retrieval despite some performance trade-offs.

Contribution

It provides an empirical evaluation of Infini-attention in 300M-parameter models, highlighting its benefits and limitations for long-context processing in small language models.

Findings

01

Infini-attention improves long-context retrieval accuracy.

02

Performance drops with repeated memory compressions.

03

Achieves up to 31% higher accuracy at 16,384 tokens.

Abstract

This study investigates small-scale pretraining for Small Language Models (SLMs) to enable efficient use of limited data and compute, improve accessibility in low-resource settings and reduce costs. To enhance long-context extrapolation in compact models, we focus on Infini-attention, which builds a compressed memory from past segments while preserving local attention. In our work, we conduct an empirical study using 300M-parameter LLaMA models pretrained with Infini-attention. The model demonstrates training stability and outperforms the baseline in long-context retrieval. We identify the balance factor as a key part of the model performance, and we found that retrieval accuracy drops with repeated memory compressions over long sequences. Even so, Infini-attention still effectively compensates for the SLM's limited parameters. Particularly, despite performance degradation at a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques