SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning
Manav Vora, Gokul Puthumanaillam, Hiroyasu Tsukamoto, Melkior Ornik

TL;DR
SCoUT introduces a scalable, utility-guided temporal grouping method for multi-agent reinforcement learning that improves communication efficiency and coordination by dynamically clustering agents and providing precise credit assignment.
Contribution
The paper proposes SCoUT, a novel approach that combines temporal grouping, differentiable affinity, and counterfactual credit assignment to enhance communication in MARL.
Findings
SCoUT outperforms existing methods in coordination tasks.
Temporal grouping reduces communication complexity.
Counterfactual credit assignment improves learning accuracy.
Abstract
Communication can improve coordination in partially observed multi-agent reinforcement learning (MARL), but learning \emph{when} and \emph{who} to communicate with requires choosing among many possible sender-recipient pairs, and the effect of any single message on future reward is hard to isolate. We introduce \textbf{SCoUT} (\textbf{S}calable \textbf{Co}mmunication via \textbf{U}tility-guided \textbf{T}emporal grouping), which addresses both these challenges via temporal and agent abstraction within traditional MARL. During training, SCoUT resamples \textit{soft} agent groups every \(K\) environment steps (macro-steps) via Gumbel-Softmax; these groups are latent clusters that induce an affinity used as a differentiable prior over recipients. Using the same assignments, a group-aware critic predicts values for each agent group and maps them to per-agent baselines through the same soft…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
