TL;DR
This paper introduces Extra-CoT, a novel framework for compressing Chain-of-Thought reasoning in large language models, significantly reducing tokens while maintaining or improving accuracy, especially in mathematical reasoning tasks.
Contribution
The paper proposes a new compression method with a dedicated compressor, mixed-ratio fine-tuning, and hierarchical policy optimization to enhance reasoning efficiency without sacrificing fidelity.
Findings
Achieves over 73% token reduction with improved accuracy on MATH-500.
Outperforms state-of-the-art methods in mathematical reasoning benchmarks.
Demonstrates effective high-fidelity compression for large language models.
Abstract
Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs), yet it incurs substantial computational overhead for inference. Existing CoT compression methods often suffer from a critical loss of logical fidelity at high compression ratios, resulting in significant performance degradation. To achieve high-fidelity, fast reasoning, we propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy. To generate reliable, high-fidelity supervision, we first train a dedicated semantically-preserved compressor on mathematical CoT data with fine-grained annotations. An LLM is then fine-tuned on these compressed pairs via a mixed-ratio supervised fine-tuning (SFT), teaching it to follow a spectrum of compression budgets and providing a stable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
