L0: Reinforcement Learning to Become General Agents

Junjie Zhang; Jingyi Xi; Zhuoyang Song; Junyu Lu; Yuhua Ke; Ting Sun; Yukun Yang; Jiaxing Zhang; Songxin Zhang; Zejian Xie

arXiv:2506.23667·cs.CL·July 1, 2025

L0: Reinforcement Learning to Become General Agents

Junjie Zhang, Jingyi Xi, Zhuoyang Song, Junyu Lu, Yuhua Ke, Ting Sun, Yukun Yang, Jiaxing Zhang, Songxin Zhang, Zejian Xie

PDF

Open Access 1 Repo

TL;DR

L0 introduces a scalable reinforcement learning pipeline for training general-purpose agents, significantly improving problem-solving accuracy in complex tasks using a novel agent scaffold and open-source tools.

Contribution

The paper presents L0, a new end-to-end training system with a unique agent scaffold and RLVR, enabling efficient development of robust general agents.

Findings

01

Boosted SimpleQA accuracy from 30% to 80%.

02

Improved HotpotQA accuracy from 22% to 41%.

03

Demonstrated effective reinforcement learning with verifiable rewards.

Abstract

Training large language models (LLMs) to act as autonomous agents for multi-turn, long-horizon tasks remains significant challenges in scalability and training efficiency. To address this, we introduce L-Zero (L0), a scalable, end-to-end training pipeline for general-purpose agents. Featuring a low-cost, extensible, and sandboxed concurrent agent worker pool, L0 lowers the barrier for applying reinforcement learning in complex environments. We also introduce NB-Agent, the agent scaffold within L0, which operates in a "code-as-action" fashion via a Read-Eval-Print-Loop (REPL). We evaluate L0 on factuality question-answering benchmarks. Our experiments demonstrate that a base model can develop robust problem-solving skills using solely Reinforcement Learning with Verifiable Rewards (RLVR). On the Qwen2.5-7B-Instruct model, our method boosts accuracy on SimpleQA from 30 % to 80 % and on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cmriat/l0
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsBalanced Selection