Diffusion Forcing for Multi-Agent Interaction Sequence Modeling
Vongani H. Maluleke, Kie Horiuchi, Lea Wilken, Evonne Ng, Jitendra Malik, Angjoo Kanazawa

TL;DR
MAGNet is a unified diffusion-based model that generates multi-agent interactions, capturing complex social behaviors and coordinating multiple agents over long sequences with flexible task support.
Contribution
Introduces MAGNet, a versatile autoregressive diffusion framework capable of modeling diverse multi-agent interactions within a single unified model.
Findings
Performs on par with specialized methods on dyadic benchmarks.
Extends naturally to polyadic multi-agent scenarios.
Generates coherent long-duration multi-agent sequences.
Abstract
Understanding and generating multi-person interactions is a fundamental challenge with broad implications for robotics and social computing. While humans naturally coordinate in groups, modeling such interactions remains difficult due to long temporal horizons, strong inter-agent dependencies, and variable group sizes. Existing motion generation methods are largely task-specific and do not generalize to flexible multi-agent generation. We introduce MAGNet (Multi-Agent Generative Network), a unified autoregressive diffusion framework for multi-agent motion generation that supports a wide range of interaction tasks through flexible conditioning and sampling. MAGNet performs dyadic and polyadic prediction, partner inpainting, partner prediction, and agentic generation all within a single model, and can autoregressively generate ultra-long sequences spanning hundreds of motion steps. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI
