Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai, Haotian Xu, Zhong-Zhi Li, Xing W, Weinong Wang, Jian Hu, Yingying Zhang, Wenqiang Zhang

TL;DR
This paper investigates how reinforcement learning enables language models to autonomously generate and execute code for mathematical reasoning, revealing predictable scaling laws that improve problem-solving accuracy.
Contribution
It demonstrates that key metrics like code execution frequency and task accuracy scale predictably with training steps in tool-integrated RL, providing foundational insights into autonomous reasoning.
Findings
Increased training steps lead to higher code execution frequency.
Task accuracy improves with more training, correlating with code use.
The framework outperforms non-tool baselines on math benchmarks.
Abstract
Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation. While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents autonomously learn to leverage external tools like code execution remains crucial. We investigate RL from outcome-based rewards for Tool-Integrated Reasoning, ZeroTIR, training base LLMs to spontaneously generate and execute Python code for mathematical problems without supervised tool-use examples. Our central contribution is we demonstrate that as RL training progresses, key metrics scale predictably. Specifically, we observe strong positive correlations where increased training steps lead to increases in the spontaneous code execution frequency, the average response length, and, critically, the final task accuracy. This suggests a quantifiable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Computability, Logic, AI Algorithms · Metaheuristic Optimization Algorithms Research
MethodsBalanced Selection
