Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression

Yuntian Tang; Bohan Jia; Wenxuan Huang; Lianyue Zhang; Jiao Xie; Wenxi Li; Wei Li; Jie Hu; Xinghao Chen Rongrong Ji; Shaohui Lin

arXiv:2602.08324·cs.LG·May 18, 2026

Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression

Yuntian Tang, Bohan Jia, Wenxuan Huang, Lianyue Zhang, Jiao Xie, Wenxi Li, Wei Li, Jie Hu, Xinghao Chen Rongrong Ji, Shaohui Lin

PDF

1 Repo 2 Models

TL;DR

This paper introduces Extra-CoT, a novel framework for compressing Chain-of-Thought reasoning in large language models, significantly reducing tokens while maintaining or improving accuracy, especially in mathematical reasoning tasks.

Contribution

The paper proposes a new compression method with a dedicated compressor, mixed-ratio fine-tuning, and hierarchical policy optimization to enhance reasoning efficiency without sacrificing fidelity.

Findings

01

Achieves over 73% token reduction with improved accuracy on MATH-500.

02

Outperforms state-of-the-art methods in mathematical reasoning benchmarks.

03

Demonstrates effective high-fidelity compression for large language models.

Abstract

Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs), yet it incurs substantial computational overhead for inference. Existing CoT compression methods often suffer from a critical loss of logical fidelity at high compression ratios, resulting in significant performance degradation. To achieve high-fidelity, fast reasoning, we propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy. To generate reliable, high-fidelity supervision, we first train a dedicated semantically-preserved compressor on mathematical CoT data with fine-grained annotations. An LLM is then fine-tuned on these compressed pairs via a mixed-ratio supervised fine-tuning (SFT), teaching it to follow a spectrum of compression budgets and providing a stable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Mwie1024/Extra-CoT
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques