TL;DR
SkillGraph introduces a dynamic, self-evolving framework for multi-agent systems that adapt communication topologies and agent skills based on visual and query context, improving performance across multiple benchmarks.
Contribution
It proposes a novel joint framework that evolves agent expertise and communication topology using a multimodal graph transformer and self-refining skill bank.
Findings
Achieves consistent improvements across four benchmarks.
Effectively adapts communication topology to content and query.
Enhances multi-agent collaboration with dynamic, content-aware routing.
Abstract
Scaling vision-language models into Visual Multiagent Systems (VMAS) is hindered by two coupled issues. First, communication topologies are fixed before inference, leaving them blind to visual content and query context; second, agent reasoning abilities remain static during deployment. These issues reinforce each other: a rigid topology fails to leverage richer agent expertise, while static agents lack incentives to specialize for a given query. We address this with SkillGraph, a joint framework that evolves both agent expertise and communication topology. Within this framework, a Multimodal Graph Transformer (MMGT) encodes visual tokens, instruction semantics and active skill embeddings to predict a query-conditioned collaboration graph, replacing hand-crafted routing with dynamic, content-aware information flow. Complementing this, a Skill Designer distills and refines reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
