TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs

Yuxiang Zhang; Zhengxu Yu; Weihang Pan; Zhongming Jin; Qiang Fu; Deng Cai; Binbin Lin; Jieping Ye

arXiv:2511.13223·cs.LG·November 18, 2025

TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs

Yuxiang Zhang, Zhengxu Yu, Weihang Pan, Zhongming Jin, Qiang Fu, Deng Cai, Binbin Lin, Jieping Ye

PDF

Open Access

TL;DR

TokenSqueeze is a novel method that compresses reasoning paths in large language models, significantly reducing token usage while maintaining high accuracy, thus improving efficiency for reasoning tasks.

Contribution

It introduces an adaptive self-generated data approach and linguistic refinement to preserve reasoning performance during token compression.

Findings

01

Achieved 50% token reduction on MATH500 benchmark.

02

Maintained accuracy despite significant token reduction.

03

Demonstrated effectiveness across diverse reasoning tasks.

Abstract

Emerging reasoning LLMs such as OpenAI-o1 and DeepSeek-R1 have achieved strong performance on complex reasoning tasks by generating long chain-of-thought (CoT) traces. However, these long CoTs result in increased token usage, leading to higher inference latency and memory consumption. As a result, balancing accuracy and reasoning efficiency has become essential for deploying reasoning LLMs in practical applications. Existing long-to-short (Long2Short) methods aim to reduce inference length but often sacrifice accuracy, revealing a need for an approach that maintains performance while lowering token costs. To address this efficiency-accuracy tradeoff, we propose TokenSqueeze, a novel Long2Short method that condenses reasoning paths while preserving performance and relying exclusively on self-generated data. First, to prevent performance degradation caused by excessive compression of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Semantic Web and Ontologies · Natural Language Processing Techniques