Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms
Jie Xiao, Changyuan Fan, Qingnan Ren, Alfred Long, Yuchen Zhang, Rymon Yu, Eric Yang, Lynn Ai, Shaoduo Gan

TL;DR
Echo introduces a novel RL system that decouples inference and training phases across heterogeneous hardware, maintaining efficiency and enabling large-scale RL for LLMs on distributed resources.
Contribution
The paper presents Echo, a system that separates inference and training in large-scale RL, using lightweight synchronization protocols to improve hardware utilization and scalability.
Findings
Echo matches baseline convergence speed and reward.
Offloads trajectory generation to edge hardware.
Enables large-scale RL with heterogeneous resources.
Abstract
Modern RL-based post-training for large language models (LLMs) co-locate trajectory sampling and policy optimisation on the same GPU cluster, forcing the system to switch between inference and training workloads. This serial context switching violates the single-program-multiple-data (SPMD) assumption underlying today's distributed training systems. We present Echo, the RL system that cleanly decouples these two phases across heterogeneous "inference" and "training" swarms while preserving statistical efficiency. Echo introduces two lightweight synchronization protocols: a sequential pull mode that refreshes policy weights according to API call for minimal bias, and an asynchronous push-pull mode that streams version-tagged rollouts through a replay buffer to maximise hardware utilisation. Training four representative RL workloads with Qwen3-4B, Qwen2.5-7B, Qwen3-30B-A3B-Thinking-2507…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Topic Modeling · Natural Language Processing Techniques
