JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Bingxiang He; Zekai Qu; Zeyuan Liu; Yinghao Chen; Yuxin Zuo; Cheng Qian; Kaiyan Zhang; Weize Chen; Chaojun Xiao; Ganqu Cui; Ning Ding; Zhiyuan Liu

arXiv:2512.16649·cs.CL·December 19, 2025

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Bingxiang He, Zekai Qu, Zeyuan Liu, Yinghao Chen, Yuxin Zuo, Cheng Qian, Kaiyan Zhang, Weize Chen, Chaojun Xiao, Ganqu Cui, Ning Ding, Zhiyuan Liu

PDF

Open Access 2 Models

TL;DR

JustRL demonstrates that a minimal, single-stage training approach with fixed hyperparameters can achieve state-of-the-art results on large language models, challenging the need for complex training pipelines.

Contribution

The paper introduces a simple RL training recipe that achieves competitive performance without multi-stage processes or hyperparameter tuning, simplifying large language model training.

Findings

01

Achieves 54.9% and 64.3% accuracy on mathematical benchmarks.

02

Uses half the compute of more complex methods.

03

Hyperparameters transfer across models without tuning.

Abstract

Recent advances in reinforcement learning for large language models have converged on increasing complexity: multi-stage training pipelines, dynamic hyperparameter schedules, and curriculum learning strategies. This raises a fundamental question: \textbf{Is this complexity necessary?} We present \textbf{JustRL}, a minimal approach using single-stage training with fixed hyperparameters that achieves state-of-the-art performance on two 1.5B reasoning models (54.9\% and 64.3\% average accuracy across nine mathematical benchmarks) while using 2 $\times$ less compute than sophisticated approaches. The same hyperparameters transfer across both models without tuning, and training exhibits smooth, monotonic improvement over 4,000+ steps without the collapses or plateaus that typically motivate interventions. Critically, ablations reveal that adding ``standard tricks'' like explicit length…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning and Data Classification