TL;DR
SABER is a reinforcement learning framework that enables large language models to switch between different reasoning modes, balancing accuracy and inference cost, and improving efficiency across various reasoning tasks.
Contribution
The paper introduces SABER, a novel training method that allows LLMs to adapt their reasoning depth and speed through controllable modes, optimizing performance under different resource constraints.
Findings
SABER-FastThink reduces reasoning length by 65.4% on MATH.
SABER achieves a 3.6% accuracy improvement over the base model on MATH.
Models trained with SABER generalize well across domains and scales.
Abstract
Large language models (LLMs) empowered by chain-of-thought reasoning have achieved impressive accuracy on complex tasks but suffer from excessive inference costs and latency when applied uniformly to all problems. We propose SABER (Switchable and Balanced Training for Efficient LLM Reasoning), a reinforcement learning framework that endows LLMs with user-controllable, token-budgeted reasoning. SABER first profiles each training example's base-model thinking token usage and assigns it to one of the predefined budget tiers. During fine-tuning, the model is guided by system prompts and length-aware rewards to respect its assigned budget. In parallel, we incorporate no-think examples to ensure the model remains reliable even when explicit reasoning is turned off. SABER further supports four discrete inference modes - NoThink, FastThink, CoreThink, and DeepThink, enabling flexible trade-offs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
