Shorthand for Thought: Compressing LLM Reasoning via Entropy-Guided Supertokens

Zhenyu Zhao; Sander Land; Daniel M. Bikel; Waseem Alshikh

arXiv:2604.26355·cs.CL·May 6, 2026

Shorthand for Thought: Compressing LLM Reasoning via Entropy-Guided Supertokens

Zhenyu Zhao, Sander Land, Daniel M. Bikel, Waseem Alshikh

PDF

TL;DR

This paper introduces a model-agnostic compression method for reasoning traces in LLMs using entropy-guided supertokens, reducing trace length by 8.1% without accuracy loss and enhancing interpretability.

Contribution

It proposes a novel entropy-based supertoken compression pipeline for reasoning traces, improving efficiency and interpretability across multiple models and benchmarks.

Findings

01

Reasoning traces can be effectively compressed by 8.1% without accuracy loss.

02

Supertokens serve as interpretable annotations of reasoning strategies.

03

Differences in trace transitions reveal systematic patterns between correct and incorrect reasoning.

Abstract

Reasoning in Large Language Models incurs significant inference-time compute, yet the token-level information structure of reasoning traces remains underexplored. We observe that reasoning tokens split into two functional types: low-entropy \textit{structural} tokens (recurring phrases that scaffold the reasoning process) and higher-entropy \textit{organic} tokens (problem-specific content that drives toward a solution). This asymmetry motivates a simple, model-agnostic compression pipeline: apply cross-word BPE merges on a model's own reasoning traces to derive \textit{supertokens} that capture frequent structural patterns, then teach the model to adopt them via supervised fine-tuning. Across three model families and five mathematical reasoning benchmarks, our approach shortens reasoning traces by 8.1\% on average with no statistically significant accuracy loss on any model--benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.