TL;DR
The paper introduces Sequential Agent Tuning (SAT), a scalable, decentralized training method for multi-LLM teams that guarantees monotonic improvement and allows plug-and-play upgrades, demonstrated on benchmark tasks.
Contribution
SAT provides a novel coordinator-free training paradigm with theoretical guarantees for monotonic improvement and plug-and-play agent upgrades in multi-LLM systems.
Findings
A team of three 4B agents trained with SAT outperforms larger models on benchmarks.
Swapping in stronger agents improves team performance by over 10%.
SAT ensures stable, scalable training with formal performance guarantees.
Abstract
Large language models (LLMs) with a large number of parameters achieve strong performance but are often prohibitively expensive to deploy. Recent work explores using teams of smaller, more efficient LLMs that collectively match or even outperform a single large model. However, jointly updating multiple agents introduces compounding distribution shifts, making coordination and stability during training difficult. We address this by introducing Sequential Agent Tuning (SAT), a coordinator-free training paradigm. SAT represents the team as a factorized policy and employs block-coordinate updates over agents, enabling scalable, decentralized training without a central controller. Specifically, we develop a sequence-aware, on-policy advantage estimator that conditions on the evolving team policy, coupled with per-agent KL trust regions that isolate occupancy drift. Theoretically, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
