Learning Multi-Agent Communication with Contrastive Learning
Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch

TL;DR
This paper introduces a contrastive learning approach for multi-agent communication, treating messages as environment state views, leading to improved coordination, faster learning, and more symmetric, informative communication.
Contribution
It proposes a novel contrastive learning method that maximizes mutual information between messages, enhancing communication effectiveness in multi-agent reinforcement learning.
Findings
Outperforms previous methods in performance and learning speed
Induces more symmetric and global state-aware communication
Effectively captures environment information through learned messages
Abstract
Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive…
Peer Reviews
Decision·ICLR 2024 poster
The idea delivered by this work is clear and somewhat grounded. Indeed it would be worthwhile for agent to learn a guidance of its message during multi-agent communication. And the intuition of enforcing messages under similiar state to be alike with each other is a straightforward motivation, for which contrastive learning might be one of the most popular method to achieve.
However, after going through the whole paper, It is easy to find that the proposed idea is less sufficiently proved and there are many flaws in the manuscript. There are a few such perspectives: 1. In section 4, the negative samples are defined as from outside the current time window or other trajectories. This is not technically sound since it would be possible for agents to encounter similar states at different trajectories (which would be considered as negative by the proposal). It is suggest
This paper tackles the problem of communication for fully independent learners, which is a very important topic in MARL and it is often underexplored. Also, mixing contrastive learning with MARL is interesting. Generally, the paper is well organised and well written.
Overall, this paper is interesting and investigates an important topic in MARL. However, I still have some concerns and questions that I would like the authors to comment on. Please find my comments below and questions ahead. * The example of predator prey in figure 1 (right) is a bit confusing. I would not agree that the given examples correspond to similar views; for example, the first view (counting from the top) seems more similar to the third view rather than to the second view. * In secti
I like this paper. It presents a simple idea that works well. ## Originality Applying contastive losses to emergent communication is somewhat novel. (I know other works have also come out in this area, but they remain different in some important ways). ## Quality The work is well-scoped and presented, with good results backing up claims. ## Clarity I find the paper quite clear. Some figures could likely be redone to present the same information better (e.g., Figure 3), but mostly these are sm
Overall, this is a strong paper. To further improve the paper the authors could 1) Conduct further experiments to fill in Figure 4 in more detail (instead of just 3 or 4 checkpoints along the curve) 2) Run more trials, especially in the traffic junction where variance is high and not all methods seem to have converged.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multi-Agent Systems and Negotiation · Modular Robots and Swarm Intelligence
MethodsContrastive Learning
