No Mean Feat: Simple, Strong Baselines for Context Compression
Yair Feldman, Yoav Artzi

TL;DR
This paper introduces BenchPress, a standardized evaluation suite for context compression in Transformers, along with simple yet effective baselines that outperform existing methods, highlighting the benefits of bidirectional attention and pooling.
Contribution
It provides a reproducible benchmarking framework and demonstrates that simple methods like mean pooling and bidirectional attention are highly effective for context compression.
Findings
Bidirectional attention improves compressed representations.
Simple pooling methods are highly expressive for context compression.
BenchPress supports benchmarking across various model scales and datasets.
Abstract
Context compression reduces Transformer inference costs by replacing lengthy inputs with shorter pre-computed representations. It carries significant benefits for retrieval-augmented generation (RAG) and has attracted growing research attention. However, progress remains difficult to measure due to inconsistent evaluations and baselines. We design a standard, easy-to-reproduce evaluation suite for context compression, BenchPress, along with simple, high-performance baselines for English reading comprehension. BenchPress supports benchmarking across model scales, datasets, compression ratios, and short (1K tokens) to mid-range (8K tokens) contexts. While the suite is applicable to any compression paradigm, our baselines target soft context compression. We establish two simple baselines that strongly outperform the widely used causal compression-token approach: mean pooling and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
