TL;DR
TeamTR introduces a trust-region fine-tuning method for multi-agent LLM systems that mitigates coordination issues caused by context shifts, leading to improved performance and stability.
Contribution
We formalize the compounding occupancy shift problem in multi-agent LLM fine-tuning and propose TeamTR, a trust-region framework that enforces divergence control and improves coordination.
Findings
TeamTR outperforms baselines with 7.1% average improvement.
It mitigates coordination regressions in multi-agent systems.
Supports plug-and-play component replacement.
Abstract
Multi-agent LLM systems have shown promise for complex reasoning, yet recent evaluations reveal they often underperform single-model baselines. We identify a structural failure mode in sequential fine-tuning of shared-context teams: updating one agent shifts the team's context distribution, and when subsequent updates are evaluated on cached rollouts, this mismatch compounds. We formalize this as the compounding occupancy shift and prove that stale-occupancy evaluation incurs a penalty that scales quadratically with the number of agents. In contrast, intermediate-occupancy evaluation reduces this to linear scaling. We propose TeamTR, a trust-region framework that resamples trajectories after each component update and enforces per-agent divergence control, yielding rigorous per-update and per-stage improvement lower bounds. Experiments show that TeamTR outperforms single-agent and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
