TL;DR
AgentSlimming is a framework that compresses multi-agent workflows to reduce token costs by pruning and replacing redundant agents, maintaining performance and improving efficiency.
Contribution
It introduces a novel, plug-and-play compression method for multi-agent systems that effectively reduces costs while preserving or enhancing task performance.
Findings
Reduced token cost by up to 78.9% with negligible performance loss.
Achieved a Pareto-optimal trade-off between cost and quality.
Code is publicly available at https://github.com/CitrusYL/AgentSlimming
Abstract
Large Language Model-based Multi-Agent Systems (MAS) have demonstrated remarkable capabilities in complex tasks. However, manually designing optimal communication topologies is labor-intensive, while automated expansion methods often result in bloated structures with redundant agents, leading to excessive token consumption. To address this problem, we introduce \textbf{AgentSlimming}, a plug-and-play compression framework for graph-structured multi-agent workflows. Motivated by pruning and quantization in neural networks, AgentSlimming compresses workflows by first estimating the importance score of each agent with a hybrid mechanism, and then removes redundant agents or replaces them with low-cost ones, where each operation is validated using a baseline-anchored acceptance rule to prevent performance collapse. Experiments show that AgentSlimming reduces average token cost by up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
