Constraint-Rectified Training for Efficient Chain-of-Thought
Qinhang Wu, Sen Lin, Ming Zhang, Yingbin Liang, Ness B. Shroff

TL;DR
This paper introduces CRT, a training framework that enhances the efficiency of Chain-of-Thought reasoning in large language models by reducing redundancy and maintaining accuracy, resulting in more cost-effective and interpretable reasoning processes.
Contribution
CRT provides a stable, interpretable post-training method that effectively balances reasoning length and accuracy, improving efficiency without sacrificing answer quality.
Findings
CRT reduces token usage while maintaining accuracy.
It shortens reasoning traces and reduces internal redundancy.
Enables fine-grained control over explanation length.
Abstract
Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs), especially when combined with reinforcement learning (RL) based post-training methods. While longer reasoning traces can improve answer quality and unlock abilities such as self-correction, they also incur high inference costs and often introduce redundant steps, known as overthinking. Recent research seeks to develop efficient reasoning strategies that balance reasoning length and accuracy, either through length-aware reward design or prompt-based calibration. However, these heuristic-based approaches may suffer from severe accuracy drop and be very sensitive to hyperparameters. To address these problems, we introduce CRT (Constraint-Rectified Training), a principled post-training framework based on reference-guarded constrained optimization, yielding a more stable and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
