Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes

Justin Bauer; Thomas Walshe; Derek Pham; Harit Vishwakarma; Armin Parchami; Frederic Sala; Paroma Varma

arXiv:2604.18381·cs.AI·April 21, 2026

Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes

Justin Bauer, Thomas Walshe, Derek Pham, Harit Vishwakarma, Armin Parchami, Frederic Sala, Paroma Varma

PDF

TL;DR

This paper empirically studies how small language models perform with RLVR in low data settings, revealing that mixed complexity training enhances sample efficiency and model generalization.

Contribution

It provides the first comprehensive analysis of open-source SLMs with RLVR in low data regimes across diverse procedural datasets, highlighting benefits of mixed complexity training.

Findings

01

Procedural datasets enable controllable evaluation and dataset development.

02

Models trained on lower complexity tasks can generalize to higher complexity tasks.

03

Training on mixed complexity datasets yields up to 5x sample efficiency in low data regimes.

Abstract

Fine-tuning Large Language Models (LLMs) typically relies on large quantities of high-quality annotated data, or questions with well-defined ground truth answers in the case of Reinforcement Learning with Verifiable Rewards (RLVR). While previous work has explored the benefits to model reasoning capabilities by scaling both data and compute used for RLVR, these results lack applicability in many real-world settings where annotated data and accessible compute may be scarce. In this work, we present a comprehensive empirical study of open-source Small Language Model (SLM) performance after RLVR in low data regimes. Across three novel datasets covering number counting problems, graph reasoning, and spatial reasoning, we characterize how model performance scales with dataset size, diversity, and complexity. We demonstrate that (1) procedural datasets allow for fine-grained evaluation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.