Learning Multi-Agent Communication with Contrastive Learning

Yat Long Lo; Biswa Sengupta; Jakob Foerster; Michael Noukhovitch

arXiv:2307.01403·cs.AI·February 5, 2024

Learning Multi-Agent Communication with Contrastive Learning

Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces a contrastive learning approach for multi-agent communication, treating messages as environment state views, leading to improved coordination, faster learning, and more symmetric, informative communication.

Contribution

It proposes a novel contrastive learning method that maximizes mutual information between messages, enhancing communication effectiveness in multi-agent reinforcement learning.

Findings

01

Outperforms previous methods in performance and learning speed

02

Induces more symmetric and global state-aware communication

03

Effectively captures environment information through learned messages

Abstract

Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

The idea delivered by this work is clear and somewhat grounded. Indeed it would be worthwhile for agent to learn a guidance of its message during multi-agent communication. And the intuition of enforcing messages under similiar state to be alike with each other is a straightforward motivation, for which contrastive learning might be one of the most popular method to achieve.

Weaknesses

However, after going through the whole paper, It is easy to find that the proposed idea is less sufficiently proved and there are many flaws in the manuscript. There are a few such perspectives: 1. In section 4, the negative samples are defined as from outside the current time window or other trajectories. This is not technically sound since it would be possible for agents to encounter similar states at different trajectories (which would be considered as negative by the proposal). It is suggest

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

This paper tackles the problem of communication for fully independent learners, which is a very important topic in MARL and it is often underexplored. Also, mixing contrastive learning with MARL is interesting. Generally, the paper is well organised and well written.

Weaknesses

Overall, this paper is interesting and investigates an important topic in MARL. However, I still have some concerns and questions that I would like the authors to comment on. Please find my comments below and questions ahead. * The example of predator prey in figure 1 (right) is a bit confusing. I would not agree that the given examples correspond to similar views; for example, the first view (counting from the top) seems more similar to the third view rather than to the second view. * In secti

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

I like this paper. It presents a simple idea that works well. ## Originality Applying contastive losses to emergent communication is somewhat novel. (I know other works have also come out in this area, but they remain different in some important ways). ## Quality The work is well-scoped and presented, with good results backing up claims. ## Clarity I find the paper quite clear. Some figures could likely be redone to present the same information better (e.g., Figure 3), but mostly these are sm

Weaknesses

Overall, this is a strong paper. To further improve the paper the authors could 1) Conduct further experiments to fill in Figure 4 in more detail (instead of just 3 or 4 checkpoints along the curve) 2) Run more trials, especially in the traffic junction where variance is high and not all methods seem to have converged.

Videos

Learning Multi-Agent Communication with Contrastive Learning· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multi-Agent Systems and Negotiation · Modular Robots and Swarm Intelligence

MethodsContrastive Learning