Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains

Roy Rinberg; Annabelle Michael Carrell; Simon Henniger; Nicholas Carlini; Keri Warr

arXiv:2604.02343·cs.LG·April 6, 2026

Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains

Roy Rinberg, Annabelle Michael Carrell, Simon Henniger, Nicholas Carlini, Keri Warr

PDF

TL;DR

This paper explores advanced compression techniques for LLM-generated text, including lossless, lossy, and interactive protocols, achieving significant reductions in data size while maintaining model capabilities.

Contribution

It introduces a novel interactive compression protocol called Question-Asking, which transfers information efficiently through yes/no questions, outperforming previous methods.

Findings

01

LoRA adapters double lossless compression efficiency.

02

Lossy compression with prompting achieves 0.03 ratio, doubling prior results.

03

Question-Asking protocol recovers up to 72% of model capability gap with minimal bits.

Abstract

We study the compression of LLM-generated text across lossless and lossy regimes, characterizing a compression-compute frontier where more compression is possible at the cost of more compute. For lossless compression, domain-adapted LoRA adapters can improve LLM-based arithmetic coding by 2x over compression with the base LLM alone. For lossy compression, prompting a model for a succinct rewrite then applying arithmetic coding can achieve compression ratios of approximately 0.03, a 2x improvement over compressing the original response. We further introduce Question-Asking compression (QA), an interactive lossy protocol inspired by the game 'Twenty Questions'. A small model iteratively refines its response by asking yes/no questions to a stronger model, transferring exactly one bit per answer. On 8 benchmarks spanning math, science, and code, 10 binary questions recover 23% to 72% of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.