Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR

Abdelaziz Bounhar; Hadi Abdine; Evan Dufraisse; Ahmad Chamma; Amr Mohamed; Dani Bouch; Michalis Vazirgiannis; Guokan Shang

arXiv:2511.01937·cs.LG·January 12, 2026

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR

Abdelaziz Bounhar, Hadi Abdine, Evan Dufraisse, Ahmad Chamma, Amr Mohamed, Dani Bouch, Michalis Vazirgiannis, Guokan Shang

PDF

Open Access 2 Models 1 Datasets

TL;DR

This paper introduces a method where including moderately easy problems in training encourages language models to produce shorter, more concise reasoning chains without sacrificing accuracy, effectively reducing verbosity naturally.

Contribution

It demonstrates that up-weighting easy problems in RLVR acts as an implicit length regularizer, leading to shorter solutions without explicit length penalties.

Findings

01

Models achieve baseline accuracy with nearly half the output length.

02

Shorter solutions do not compromise reasoning quality.

03

The approach is effective on large language models like Qwen3-4B-Thinking-2507.

Abstract

Large language models (LLMs) trained for step-by-step reasoning often become excessively verbose, raising inference cost. Standard Reinforcement Learning with Verifiable Rewards (RLVR) pipelines filter out ``easy'' problems for training efficiency, leaving the model to train primarily on harder problems that require longer reasoning chains. This skews the output length distribution upward, resulting in a \textbf{model that conflates ``thinking longer'' with ``thinking better''}. In this work, we show that retaining and modestly up-weighting moderately easy problems acts as an implicit length regularizer. Exposing the model to solvable short-chain tasks constrains its output distribution and prevents runaway verbosity. The result is \textbf{\emph{emergent brevity for free}}: the model learns to solve harder problems without inflating the output length, \textbf{ despite the absence of any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

MBZUAI-Paris/Frugal-Thinking-RL-Data
dataset· 11 dl
11 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)