Multi-Agent Decision-Focused Learning via Value-Aware Sequential Communication
Benjamin Amoh, Geoffrey Parker, Wesley Marrero

TL;DR
SeqComm-DFL is a novel multi-agent communication method that optimizes message generation for decision quality using sequential, value-aware strategies, significantly improving coordination in complex tasks.
Contribution
It introduces a unified framework combining sequential communication with decision-focused learning, including theoretical bounds and end-to-end training for multi-agent systems.
Findings
Achieves 4-6x higher rewards on benchmarks
Over 13% win rate improvements in SMAC
Provides theoretical bounds linking communication value to coordination gaps
Abstract
Multi-agent coordination under partial observability requires agents to share complementary private information. While recent methods optimize messages for intermediate objectives (e.g., reconstruction accuracy or mutual information), rather than decision quality, we introduce \textbf{SeqComm-DFL}, unifying the sequential communication with decision-focused learning for task performance. Our approach features \emph{value-aware message generation with sequential Stackelberg conditioning}: messages maximize receiver decision quality and are generated in priority order, with agents conditioning on their predecessors. The \emph{guidance potential} determined by their prosocial ordering. We extend Optimal Model Design to communication-augmented world models with QMIX factorization, enabling efficient end-to-end training via implicit differentiation. We prove information-theoretic bounds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
