Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

Chenchen Zhang

arXiv:2605.02801·cs.CL·May 5, 2026

Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

Chenchen Zhang

PDF

1 Repo

TL;DR

This paper explores reinforcement learning for large language model-based multi-agent systems using orchestration traces, focusing on reward design, credit assignment, and decision decomposition, and releases related artifacts.

Contribution

It introduces a structured analysis of RL for LLM multi-agent orchestration, identifying key technical axes and connecting academic methods with industrial evidence.

Findings

01

Identified eight reward families for orchestration tasks.

02

Mapped RL credit signals from token to team level.

03

Decomposed orchestration learning into five key decisions.

Abstract

As large language model (LLM) agents evolve from isolated tool users into coordinated teams, reinforcement learning (RL) must optimize not only individual actions but also how work is spawned, delegated, communicated, aggregated, and stopped. This paper studies RL for LLM-based multi-agent systems through orchestration traces: temporal interaction graphs whose events include sub-agent spawning, delegation, communication, tool use, return, aggregation, and stopping decisions. Using this lens, we identify three technical axes. First, reward design spans eight families, including orchestration rewards for parallelism speedup, split correctness, and aggregation quality. Second, reward and credit signals attach to eight credit- or signal-bearing units from token to team; explicit counterfactual message-level credit remains especially sparse in our curated pool. Third, orchestration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xxzcc/awesome-llm-mas-rl
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.