Sentence-Anchored Gist Compression for Long-Context LLMs

Dmitrii Tarasov; Elizaveta Goncharova; Kuznetsov Andrey

arXiv:2511.08128·cs.CL·November 12, 2025

Sentence-Anchored Gist Compression for Long-Context LLMs

Dmitrii Tarasov, Elizaveta Goncharova, Kuznetsov Andrey

PDF

Open Access

TL;DR

This paper introduces a learned compression token method for large language models that significantly reduces context size by up to 8x with minimal performance loss, enabling more efficient long-sequence processing.

Contribution

It presents a novel fine-tuning approach for LLMs to compress context using learned tokens, achieving higher compression ratios than existing methods.

Findings

01

Compression factors of 2x to 8x without performance loss

02

Comparable results to alternative methods on benchmarks

03

Higher compression ratios achieved with minimal accuracy impact

Abstract

This work investigates context compression for Large Language Models (LLMs) using learned compression tokens to reduce the memory and computational demands of processing long sequences. We demonstrate that pre-trained LLMs can be fine-tuned to compress their context by factors of 2x to 8x without significant performance degradation, as evaluated on both short-context and long-context benchmarks. Furthermore, in experiments on a 3-billion-parameter LLaMA model, our method achieves results on par with alternative compression techniques while attaining higher compression ratios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis