Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

Xinji Mai; Haotian Xu; Zhong-Zhi Li; Xing W; Weinong Wang; Jian Hu; Yingying Zhang; Wenqiang Zhang

arXiv:2505.07773·cs.AI·August 21, 2025

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

Xinji Mai, Haotian Xu, Zhong-Zhi Li, Xing W, Weinong Wang, Jian Hu, Yingying Zhang, Wenqiang Zhang

PDF

Open Access 2 Repos

TL;DR

This paper investigates how reinforcement learning enables language models to autonomously generate and execute code for mathematical reasoning, revealing predictable scaling laws that improve problem-solving accuracy.

Contribution

It demonstrates that key metrics like code execution frequency and task accuracy scale predictably with training steps in tool-integrated RL, providing foundational insights into autonomous reasoning.

Findings

01

Increased training steps lead to higher code execution frequency.

02

Task accuracy improves with more training, correlating with code use.

03

The framework outperforms non-tool baselines on math benchmarks.

Abstract

Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation. While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents autonomously learn to leverage external tools like code execution remains crucial. We investigate RL from outcome-based rewards for Tool-Integrated Reasoning, ZeroTIR, training base LLMs to spontaneously generate and execute Python code for mathematical problems without supervised tool-use examples. Our central contribution is we demonstrate that as RL training progresses, key metrics scale predictably. Specifically, we observe strong positive correlations where increased training steps lead to increases in the spontaneous code execution frequency, the average response length, and, critically, the final task accuracy. This suggests a quantifiable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Computability, Logic, AI Algorithms · Metaheuristic Optimization Algorithms Research

MethodsBalanced Selection