TL;DR
ReBalance is a training-free framework that enhances large reasoning models by dynamically balancing overthinking and underthinking through confidence-based guidance, leading to more efficient and accurate reasoning.
Contribution
It introduces a novel confidence-based, training-free method to guide reasoning trajectories, reducing redundancy and improving accuracy across diverse tasks and models.
Findings
Reduces output redundancy in large reasoning models.
Improves accuracy on math, QA, and coding benchmarks.
Works across models from 0.5B to 32B parameters.
Abstract
Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reasoning length, may inadvertently induce underthinking, compromising accuracy. Therefore, we propose ReBalance, a training-free framework that achieves efficient reasoning with balanced thinking. ReBalance leverages confidence as a continuous indicator of reasoning dynamics, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
