Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards

Jinyan Su; Claire Cardie

arXiv:2505.18298·cs.CL·May 27, 2025

Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards

Jinyan Su, Claire Cardie

PDF

Open Access 1 Repo

TL;DR

This paper introduces an adaptive reward-shaping technique for large language models that balances reasoning accuracy and length, reducing inference costs by producing concise, correct responses through dynamic reward adjustments.

Contribution

It presents a novel adaptive reward method that dynamically balances accuracy and length, improving reasoning efficiency without sacrificing correctness.

Findings

01

Significantly reduces reasoning length across multiple datasets.

02

Maintains high accuracy despite shorter reasoning traces.

03

Accelerates early-stage length reduction without over-compression.

Abstract

Large language models (LLMs) have demonstrated strong reasoning abilities in mathematical tasks, often enhanced through reinforcement learning (RL). However, RL-trained models frequently produce unnecessarily long reasoning traces -- even for simple queries -- leading to increased inference costs and latency. While recent approaches attempt to control verbosity by adding length penalties to the reward function, these methods rely on fixed penalty terms that are hard to tune and cannot adapt as the model's reasoning capability evolves, limiting their effectiveness. In this work, we propose an adaptive reward-shaping method that enables LLMs to "think fast and right" -- producing concise outputs without sacrificing correctness. Our method dynamically adjusts the reward trade-off between accuracy and response length based on model performance: when accuracy is high, the length penalty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinyansu1/a-dlp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning