Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning

Zhida Jiang; Zhaolong Xing; Jiawei Lu; Yipei Niu; Qingyuan Sang; Liangxu Zhang; Wenquan Dai; Junhua Shu; Jiaxing Wang; Qiangyu Pei; Qiong Chen; Xinyu Liu; Fangming Liu; Ai Han; Zhen Chen; Ke Zhang

arXiv:2602.09578·cs.LG·February 11, 2026

Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning

Zhida Jiang, Zhaolong Xing, Jiawei Lu, Yipei Niu, Qingyuan Sang, Liangxu Zhang, Wenquan Dai, Junhua Shu, Jiaxing Wang, Qiangyu Pei, Qiong Chen, Xinyu Liu, Fangming Liu, Ai Han, Zhen Chen, Ke Zhang

PDF

Open Access

TL;DR

This paper introduces FlexMARL, a comprehensive training framework for large-scale multi-agent reinforcement learning with LLMs, addressing system-level challenges and achieving significant speedups and resource efficiency.

Contribution

FlexMARL is the first end-to-end framework that jointly optimizes rollout, training, and orchestration for large-scale LLM-based MARL, overcoming synchronization and load imbalance issues.

Findings

01

Achieves up to 7.3x speedup over existing frameworks.

02

Improves hardware utilization by up to 5.6x.

03

Effectively manages data flow and resource allocation in large-scale MARL.

Abstract

Despite algorithm-level innovations for multi-agent reinforcement learning (MARL), the underlying networked infrastructure for large-scale MARL training remains underexplored. Existing training frameworks primarily optimize for single-agent scenarios and fail to address the unique system-level challenges of MARL, including rollout-training synchronization barriers, rollout load imbalance, and training resource underutilization. To bridge this gap, we propose FlexMARL, the first end-to-end training framework that holistically optimizes rollout, training, and their orchestration for large-scale LLM-based MARL. Specifically, FlexMARL introduces the joint orchestrator to manage data flow under the rollout-training disaggregated architecture. Building upon the experience store, a novel micro-batch driven asynchronous pipeline eliminates the synchronization barriers while providing strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Software-Defined Networks and 5G · Ferroelectric and Negative Capacitance Devices