TL;DR
Relax is an open-source asynchronous reinforcement learning engine designed for omni-modal large language models, addressing scalability, robustness, and heterogeneity challenges with a novel architecture and achieving significant speedups.
Contribution
It introduces Relax, a scalable, fault-isolated, asynchronous RL engine with omni-native architecture supporting multi-modal data and efficient training at scale.
Findings
Relax achieves 1.20× speedup over veRL on Qwen3-4B.
Fully async mode delivers 2.00× speedup on Qwen3-Omni-30B.
Relax supports R3 with only 1.9% overhead, enabling stable omni-modal RL convergence.
Abstract
Reinforcement learning (RL) post-training has proven effective at unlocking reasoning, self-reflection, and tool-use capabilities in large language models. As models extend to omni-modal inputs and agentic multi-turn workflows, RL training systems face three interdependent challenges: heterogeneous data flows, operational robustness at scale, and the staleness -- throughput tradeoff. We present \textbf{Relax} (Reinforcement Engine Leveraging Agentic X-modality), an open-source RL training engine that addresses these challenges through three co-designed architectural layers. First, an \emph{omni-native architecture} builds multimodal support into the full stack -- from data preprocessing and modality-aware parallelism to inference generation -- rather than retrofitting it onto a text-centric pipeline. Second, each RL role runs as an independent, fault-isolated service that can be scaled,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
