Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms

Jie Xiao; Changyuan Fan; Qingnan Ren; Alfred Long; Yuchen Zhang; Rymon Yu; Eric Yang; Lynn Ai; Shaoduo Gan

arXiv:2508.05387·cs.LG·August 13, 2025

Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms

Jie Xiao, Changyuan Fan, Qingnan Ren, Alfred Long, Yuchen Zhang, Rymon Yu, Eric Yang, Lynn Ai, Shaoduo Gan

PDF

Open Access 4 Models

TL;DR

Echo introduces a novel RL system that decouples inference and training phases across heterogeneous hardware, maintaining efficiency and enabling large-scale RL for LLMs on distributed resources.

Contribution

The paper presents Echo, a system that separates inference and training in large-scale RL, using lightweight synchronization protocols to improve hardware utilization and scalability.

Findings

01

Echo matches baseline convergence speed and reward.

02

Offloads trajectory generation to edge hardware.

03

Enables large-scale RL with heterogeneous resources.

Abstract

Modern RL-based post-training for large language models (LLMs) co-locate trajectory sampling and policy optimisation on the same GPU cluster, forcing the system to switch between inference and training workloads. This serial context switching violates the single-program-multiple-data (SPMD) assumption underlying today's distributed training systems. We present Echo, the RL system that cleanly decouples these two phases across heterogeneous "inference" and "training" swarms while preserving statistical efficiency. Echo introduces two lightweight synchronization protocols: a sequential pull mode that refreshes policy weights according to API call for minimal bias, and an asynchronous push-pull mode that streams version-tagged rollouts through a replay buffer to maximise hardware utilisation. Training four representative RL workloads with Qwen3-4B, Qwen2.5-7B, Qwen3-30B-A3B-Thinking-2507…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Topic Modeling · Natural Language Processing Techniques