ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning

Jie Xiao; Meng Chen; Qingnan Ren; Jingwei Song; Jiaqi Huang; Yangshen Deng; Chris Tong; Wanyi Chen; Suli Wang; Ziqian Bi; Shuo Lu; Yiqun Duan; Xu Wang; Rymon Yu; Ween Yang; Lynn Ai; Eric Yang; Bill Shi

arXiv:2602.02192·cs.LG·April 1, 2026

ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning

Jie Xiao, Meng Chen, Qingnan Ren, Jingwei Song, Jiaqi Huang, Yangshen Deng, Chris Tong, Wanyi Chen, Suli Wang, Ziqian Bi, Shuo Lu, Yiqun Duan, Xu Wang, Rymon Yu, Ween Yang, Lynn Ai, Eric Yang, Bill Shi

PDF

TL;DR

ECHO-2 is a scalable distributed reinforcement learning framework that enhances cost efficiency in large language model post-training by overlapping rollout, dissemination, and training processes.

Contribution

It introduces a novel capacity model and peer-assisted broadcast techniques to optimize distributed RL with latency and cost considerations.

Findings

01

ECHO-2 achieves significant cost savings in large-scale RL training.

02

It maintains RL reward performance comparable to strong baselines.

03

Experiments demonstrate effectiveness on 4B and 8B models under real bandwidth conditions.

Abstract

Reinforcement learning (RL) is a critical stage in post-training large language models (LLMs), involving repeated interaction between rollout generation, reward evaluation, and centralized learning. Distributing rollout execution offers opportunities to leverage more cost-efficient inference resources, but introduces challenges in wide-area coordination and policy dissemination. We present ECHO-2, a distributed RL framework for post-training with remote inference workers and non-negligible dissemination latency. ECHO-2 combines centralized learning with distributed rollouts and treats bounded policy staleness as a user-controlled parameter, enabling rollout generation, dissemination, and training to overlap. We introduce an overlap-based capacity model that relates training time, dissemination latency, and rollout throughput, yielding a practical provisioning rule for sustaining learner…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.