TL;DR
This paper introduces Diamond Attention, a novel cross-attention architecture that uses structured randomness to enable role differentiation and coordination among homogeneous agents in multi-agent reinforcement learning, outperforming deterministic methods.
Contribution
The paper proposes a new attention-based method that incorporates structured randomness for effective coordination in symmetric multi-agent systems, enabling zero-shot generalization and transfer.
Findings
Achieves 100% success on XOR game, surpassing deterministic baselines.
Generalizes zero-shot to different team sizes in control tasks.
Enables zero-shot transfer in cross-scenario multi-agent environments.
Abstract
Full parameter sharing is standard in cooperative multi-agent reinforcement learning (MARL) for homogeneous agents. Under permutation-symmetric observations, however, a shared deterministic policy outputs identical action distributions for every agent, making role differentiation impossible. This failure can theoretically be resolved using symmetry breaking among anonymous identical processors, which requires randomness. We propose Diamond Attention, a cross-attention architecture in which each agent samples a scalar random number per timestep, inducing a transient rank ordering that masks lower-ranked peers from agent-to-agent attention while leaving task attention fully unmasked. This realizes a random-bit coordination protocol in a single broadcast round, and the set-based attention enables zero-shot deployment to teams of different sizes. We evaluate across three regimes that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
