SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning

Manav Vora; Gokul Puthumanaillam; Hiroyasu Tsukamoto; Melkior Ornik

arXiv:2603.04833·cs.MA·March 6, 2026

SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning

Manav Vora, Gokul Puthumanaillam, Hiroyasu Tsukamoto, Melkior Ornik

PDF

Open Access

TL;DR

SCoUT introduces a scalable, utility-guided temporal grouping method for multi-agent reinforcement learning that improves communication efficiency and coordination by dynamically clustering agents and providing precise credit assignment.

Contribution

The paper proposes SCoUT, a novel approach that combines temporal grouping, differentiable affinity, and counterfactual credit assignment to enhance communication in MARL.

Findings

01

SCoUT outperforms existing methods in coordination tasks.

02

Temporal grouping reduces communication complexity.

03

Counterfactual credit assignment improves learning accuracy.

Abstract

Communication can improve coordination in partially observed multi-agent reinforcement learning (MARL), but learning \emph{when} and \emph{who} to communicate with requires choosing among many possible sender-recipient pairs, and the effect of any single message on future reward is hard to isolate. We introduce \textbf{SCoUT} (\textbf{S}calable \textbf{Co}mmunication via \textbf{U}tility-guided \textbf{T}emporal grouping), which addresses both these challenges via temporal and agent abstraction within traditional MARL. During training, SCoUT resamples \textit{soft} agent groups every \(K\) environment steps (macro-steps) via Gumbel-Softmax; these groups are latent clusters that induce an affinity used as a differentiable prior over recipients. Using the same assignments, a group-aware critic predicts values for each agent group and maps them to per-agent baselines through the same soft…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications