Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models

Datta Nimmaturi; Vaishnavi Bhargava; Rajat Ghosh; Johnu George; Debojyoti Dutta

arXiv:2507.18014·cs.LG·March 23, 2026

Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models

Datta Nimmaturi, Vaishnavi Bhargava, Rajat Ghosh, Johnu George, Debojyoti Dutta

PDF

Open Access

TL;DR

This paper introduces a predictive framework for optimizing the training of large reasoning models with GRPO, enabling efficient resource use by modeling training dynamics and identifying optimal stopping points.

Contribution

It presents an empirical scaling law for GRPO training that predicts reward trajectories and guides early stopping to reduce computational costs.

Findings

01

Training beyond a certain epoch yields minimal gains

02

The scaling law accurately predicts training progress

03

Early stopping can save significant computational resources

Abstract

Fine-tuning large language models (LLMs) for reasoning tasks using reinforcement learning methods like Group Relative Policy Optimization (GRPO) is computationally expensive. To address this, we propose a predictive framework that models training dynamics and helps optimize resource usage. Through experiments on Llama and Qwen models (3B 8B), we derive an empirical scaling law based on model size, initial performance, and training progress. This law predicts reward trajectories and identifies three consistent training phases: slow start, rapid improvement, and plateau. We find that training beyond certain number of an epoch offers little gain, suggesting earlier stopping can significantly reduce compute without sacrificing performance. Our approach generalizes across model types, providing a practical guide for efficient GRPO-based fine-tuning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Neural Networks and Applications · AI-based Problem Solving and Planning