Shorthand for Thought: Compressing LLM Reasoning via Entropy-Guided Supertokens
Zhenyu Zhao, Sander Land, Daniel M. Bikel, Waseem Alshikh

TL;DR
This paper introduces a model-agnostic compression method for reasoning traces in LLMs using entropy-guided supertokens, reducing trace length by 8.1% without accuracy loss and enhancing interpretability.
Contribution
It proposes a novel entropy-based supertoken compression pipeline for reasoning traces, improving efficiency and interpretability across multiple models and benchmarks.
Findings
Reasoning traces can be effectively compressed by 8.1% without accuracy loss.
Supertokens serve as interpretable annotations of reasoning strategies.
Differences in trace transitions reveal systematic patterns between correct and incorrect reasoning.
Abstract
Reasoning in Large Language Models incurs significant inference-time compute, yet the token-level information structure of reasoning traces remains underexplored. We observe that reasoning tokens split into two functional types: low-entropy \textit{structural} tokens (recurring phrases that scaffold the reasoning process) and higher-entropy \textit{organic} tokens (problem-specific content that drives toward a solution). This asymmetry motivates a simple, model-agnostic compression pipeline: apply cross-word BPE merges on a model's own reasoning traces to derive \textit{supertokens} that capture frequent structural patterns, then teach the model to adopt them via supervised fine-tuning. Across three model families and five mathematical reasoning benchmarks, our approach shortens reasoning traces by 8.1\% on average with no statistically significant accuracy loss on any model--benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
