Multi-agent Coordination via Flow Matching

Dongsu Lee; Daehee Lee; Amy Zhang

arXiv:2511.05005·cs.LG·February 2, 2026

Multi-agent Coordination via Flow Matching

Dongsu Lee, Daehee Lee, Amy Zhang

PDF

Open Access 3 Reviews

TL;DR

MAC-Flow introduces a flow-based framework for multi-agent coordination that balances rich offline behavior representation with real-time efficiency, outperforming diffusion-based methods in speed while maintaining strong performance.

Contribution

The paper proposes MAC-Flow, a novel approach that combines flow-based joint behavior modeling with decentralized policies for fast, effective multi-agent coordination.

Findings

01

Achieves 14.5x faster inference than diffusion-based MARL methods.

02

Maintains competitive performance with prior Gaussian policy-based methods.

03

Validated across 12 environments and 34 datasets.

Abstract

This work presents MAC-Flow, a simple yet expressive framework for multi-agent coordination. We argue that requirements of effective coordination are twofold: (i) a rich representation of the diverse joint behaviors present in offline data and (ii) the ability to act efficiently in real time. However, prior approaches often sacrifice one for the other, i.e., denoising diffusion-based solutions capture complex coordination but are computationally slow, while Gaussian policy-based solutions are fast but brittle in handling multi-agent interaction. MAC-Flow addresses this trade-off by first learning a flow-based representation of joint behaviors, and then distilling it into decentralized one-step policies that preserve coordination while enabling fast execution. Across four different benchmarks, including $12$ environments and $34$ datasets, MAC-Flow alleviates the trade-off between…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. MAC-Flow effectively balances the trade-off between multi-agent coordination performance and inference speed. 2. The resulting individual policy supports seamless offline-to-online fine-tuning. 3. The ablation studies show the components of MAC-Flow are effective.

Weaknesses

1. There's still a small performance gap compared to diffusion policies (DoF), especially on SMACv2, showing room for improvement in handling highly stochastic multi-agent environments. 2. It requires offline datasets with diverse joint behaviors for effective training, and its performance may degrade when using low-quality or limited offline data. 3. The baselines for continuous control are weaker due to the absence of diffusion and flow-based policies.

Reviewer 02Rating 6Confidence 2

Strengths

1. Learning a rich joint policy with flow matching, then distilling to per agent policies, directly addresses the coordination vs speed trade off that many of us have run into in offline MARL. I think the training to deployment narrative is easy to follow and feels usable. 2. The comparisons include both diffusion based policies and conventional MARL baselines, and the results highlight a strong reduction in inference latency while retaining returns. Given how often test time latency matters in

Weaknesses

1. The reliance on an individual global max style factorization is an obvious pressure point. The paper would be stronger with stress tests on heavily coupled tasks where separability breaks down. Bounds are good, but concrete counterexamples or failure modes would build trust. 2. The link from small $W_2$ to small value loss hinges on a Lipschitz $Q_{tot}$ and on distributional proximity that may not hold uniformly. It would help to see empirical measurements of these quantities during training

Reviewer 03Rating 4Confidence 4

Strengths

1. The paper clearly identifies—and structures the solution around—the core trade-off in offline MARL: diffusion policies capture multi-modal joint behaviors but are slow, while Gaussian one-step policies are fast but brittle for coordination. 2. The paper connects to flow-matching literature and positions MAC-Flow as a MARL counterpart to single-agent flow-distillation and Flow Q-Learning, showing conceptual continuity with recent advances. 3. The two-stage design is guided by the Individual-

Weaknesses

1. The method references “mathematical guarantees” around joint-to-factorized policy learning with IGM, but the paper (as given) does not present a formal theorem / conditions under which the distilled one-step policies provably preserve the global optimum of the learned joint flow. 2. While the authors claim flow matching combines diffusion’s expressiveness and Gaussian’s speed, the paper lacks a deeper justification for why flow matching provides better inductive bias or coordination represe

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning