ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning
Minda Hu, Zexuan Qiu, Zenan Xu, Kun Li, Bo Zhou, Irwin King

TL;DR
ConMax is a reinforcement learning framework that compresses reasoning traces in large models, reducing computational costs while maintaining high accuracy and reasoning quality.
Contribution
It introduces a novel reward-based compression method that preserves reasoning integrity and improves efficiency in large reasoning models.
Findings
Reduces inference length by 43% compared to baselines.
Maintains 99.3% of original accuracy after compression.
Demonstrates effectiveness across five reasoning datasets.
Abstract
Recent breakthroughs in Large Reasoning Models (LRMs) have demonstrated that extensive Chain-of-Thought (CoT) generation is critical for enabling intricate cognitive behaviors, such as self-verification and backtracking, to solve complex tasks. However, this capability often leads to ``overthinking'', where models generate redundant reasoning paths that inflate computational costs without improving accuracy. While Supervised Fine-Tuning (SFT) on reasoning traces is a standard paradigm for the 'cold start' phase, applying existing compression techniques to these traces often compromises logical coherence or incurs prohibitive sampling costs. In this paper, we introduce ConMax (Confidence-Maximizing Compression), a novel reinforcement learning framework designed to automatically compress reasoning traces while preserving essential reasoning patterns. ConMax formulates compression as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · AI-based Problem Solving and Planning · Multimodal Machine Learning Applications
