Revisiting the Gumbel-Softmax in MADDPG

Callum Rhys Tilbury; Filippos Christianos; Stefano V. Albrecht

arXiv:2302.11793·cs.LG·June 16, 2023·5 cites

Revisiting the Gumbel-Softmax in MADDPG

Callum Rhys Tilbury, Filippos Christianos, Stefano V. Albrecht

PDF

Open Access 1 Repo

TL;DR

This paper investigates alternative gradient estimators to Gumbel-Softmax in MADDPG for discrete action spaces, demonstrating that some alternatives significantly improve performance and convergence in grid-world tasks.

Contribution

It introduces and evaluates several alternative estimators to Gumbel-Softmax within MADDPG, showing improved performance in discrete multi-agent environments.

Findings

01

One estimator achieves up to 55% higher returns.

02

Faster convergence observed with certain estimators.

03

Alternatives outperform Gumbel-Softmax in discrete scenarios.

Abstract

MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uoe-agents/revisiting-maddpg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Convolution · Experience Replay · Weight Decay · Adam · MADDPG · Batch Normalization · Deep Deterministic Policy Gradient