Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning

Ravindra Aribowo Tarunokusumo; Rafael Fernandes Cunha

arXiv:2510.21398·cs.AI·October 27, 2025

Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning

Ravindra Aribowo Tarunokusumo, Rafael Fernandes Cunha

PDF

TL;DR

This paper introduces a reinforcement learning framework to enhance the accuracy and token efficiency of budget forcing in large language models for mathematical reasoning, especially on smaller models.

Contribution

It presents a novel RL-based approach that improves reasoning performance and reduces token usage, overcoming limitations of supervised fine-tuning on small models.

Findings

01

Over 40% reduction in token usage compared to SFT model

02

Improved accuracy on GSM8K dataset with limited training samples

03

RL recovers performance losses caused by long-context training

Abstract

Test-time scaling methods have seen a rapid increase in popularity for its computational efficiency and parameter-independent training to improve reasoning performance on Large Language Models. One such method is called budget forcing, a decoding intervention strategy which allocates extra compute budget for thinking and elicits the inherent self-correcting behavior of the model. However, this relies on supervised fine-tuning (SFT) on long-context reasoning traces which causes performance degradation on smaller models due to verbose responses. For this reason, we offer a framework integrating reinforcement learning (RL) to improve token efficiency and boost the performance of a 1.5B model for mathematical reasoning. We demonstrate this using only 1.5K training samples and found that our SFT+RL model performed better on the GSM8K dataset with varying compute budgets. Our main findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.