MARLaaS: Multi-Tenant Asynchronous Reinforcement Learning as a Service

Timothy Tin Long Yu; Gursimran Singh; Ge Shi; Hanieh Sadri; Yong Zhang; Zhenan Fan

arXiv:2605.08527·cs.DC·May 12, 2026

MARLaaS: Multi-Tenant Asynchronous Reinforcement Learning as a Service

Timothy Tin Long Yu, Gursimran Singh, Ge Shi, Hanieh Sadri, Yong Zhang, Zhenan Fan

PDF

TL;DR

MARLaaS introduces a multi-tenant asynchronous system for efficient reinforcement learning fine-tuning of large language models, enabling concurrent multi-task training with significant speedups and resource utilization improvements.

Contribution

It presents a novel architecture combining shared models with lightweight adapters and asynchronous stages for scalable multi-task RL fine-tuning.

Findings

01

Achieves up to 32 concurrent tasks with state-of-the-art performance.

02

Improves accelerator utilization by up to 4.3x.

03

Reduces end-to-end training time by 85%.

Abstract

Reinforcement Learning from Verifiable Rewards (RLVR) has significantly improved the reasoning capabilities of large language models (LLMs), particularly in multi-turn agentic settings involving environment interaction like tool use. However, fine-tuning such models remains prohibitively expensive due to high computational requirements, limiting accessibility. We propose MARLaaS (Multi-tenant Asynchronous RL as a Service), a system for concurrent RL fine-tuning across multiple users and tasks. Our approach is based on two key ideas: (1) sharing a base model across tenants using lightweight LoRA adapters, and (2) a disaggregated asynchronous architecture that decouples rollout generation, environment interaction, and policy training into independently scheduled stages. This design enables tasks to progress through the RL pipeline at their own pace in an event-driven manner, reducing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.