TL;DR
LEMON is a novel LLM-based multi-agent orchestration method that uses counterfactual reinforcement learning to optimize task-specific roles, dependencies, and capacities, achieving state-of-the-art results.
Contribution
The paper introduces LEMON, a new approach that jointly optimizes multi-agent orchestration using counterfactual RL, improving over existing partial or sequential methods.
Findings
LEMON achieves state-of-the-art performance on six reasoning and coding benchmarks.
The counterfactual training signal effectively guides orchestration decisions.
LEMON outperforms existing multi-agent orchestration approaches.
Abstract
Large language models (LLMs) have become a strong foundation for multi-agent systems, but their effectiveness depends heavily on orchestration design. Across different tasks, role design, capacity assignment, and dependency construction jointly affect both solution quality and execution efficiency. Existing approaches automate parts of this design process, yet they often optimize these decisions partially or sequentially, and rely on execution-level feedback that provides limited credit assignment for local orchestration decisions. We propose LEMON (\textbf{L}earning \textbf{E}xecutable \textbf{M}ulti-agent \textbf{O}rchestratio\textbf{N} via Counterfactual Reinforcement Learning), an LLM-based orchestrator that generates an executable orchestration specification. The specification integrates task-specific roles, customized duties, capacity levels, and dependency structure into a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
