LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning

Xudong Chen; Yixin Liu; Hua Wei; Kaize Ding

arXiv:2605.14483·cs.AI·May 15, 2026

LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning

Xudong Chen, Yixin Liu, Hua Wei, Kaize Ding

PDF

1 Repo

TL;DR

LEMON is a novel LLM-based multi-agent orchestration method that uses counterfactual reinforcement learning to optimize task-specific roles, dependencies, and capacities, achieving state-of-the-art results.

Contribution

The paper introduces LEMON, a new approach that jointly optimizes multi-agent orchestration using counterfactual RL, improving over existing partial or sequential methods.

Findings

01

LEMON achieves state-of-the-art performance on six reasoning and coding benchmarks.

02

The counterfactual training signal effectively guides orchestration decisions.

03

LEMON outperforms existing multi-agent orchestration approaches.

Abstract

Large language models (LLMs) have become a strong foundation for multi-agent systems, but their effectiveness depends heavily on orchestration design. Across different tasks, role design, capacity assignment, and dependency construction jointly affect both solution quality and execution efficiency. Existing approaches automate parts of this design process, yet they often optimize these decisions partially or sequentially, and rely on execution-level feedback that provides limited credit assignment for local orchestration decisions. We propose LEMON (\textbf{L}earning \textbf{E}xecutable \textbf{M}ulti-agent \textbf{O}rchestratio\textbf{N} via Counterfactual Reinforcement Learning), an LLM-based orchestrator that generates an executable orchestration specification. The specification integrates task-specific roles, customized duties, capacity levels, and dependency structure into a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/LEMON-B23C
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.