InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs
Bin Li, Ruichi Zhang, Han Liang, Jingyan Zhang, Juze Zhang, Xin Chen, Lan Xu, Jingyi Yu, Jingya Wang

TL;DR
InterAgent is a novel end-to-end framework that uses diffusion transformers and interaction graphs to enable realistic, physics-based multi-agent humanoid control driven solely by text prompts, advancing multi-agent coordination modeling.
Contribution
This work introduces the first comprehensive framework for text-driven multi-agent humanoid control using diffusion transformers and interaction graphs, addressing previous single-agent limitations.
Findings
Outperforms existing baselines in multi-agent coordination tasks
Achieves state-of-the-art results in physics-based multi-agent behavior generation
Produces coherent and semantically faithful multi-agent behaviors from text prompts
Abstract
Humanoid agents are expected to emulate the complex coordination inherent in human social behaviors. However, existing methods are largely confined to single-agent scenarios, overlooking the physically plausible interplay essential for multi-agent interactions. To bridge this gap, we propose InterAgent, the first end-to-end framework for text-driven physics-based multi-agent humanoid control. At its core, we introduce an autoregressive diffusion transformer equipped with multi-stream blocks, which decouples proprioception, exteroception, and action to mitigate cross-modal interference while enabling synergistic coordination. We further propose a novel interaction graph exteroception representation that explicitly captures fine-grained joint-to-joint spatial dependencies to facilitate network learning. Additionally, within it we devise a sparse edge-based attention mechanism that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI
