Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games

Tailia Malloy; Tim Klinger; Miao Liu; Matthew Riemer; Gerald Tesauro; Chris R. Sims

arXiv:2011.11517·cs.AI·May 16, 2025

Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games

Tailia Malloy, Tim Klinger, Miao Liu, Matthew Riemer, Gerald Tesauro, Chris R. Sims

PDF

Open Access

TL;DR

This paper proposes a policy information regularization method in deep multi-agent reinforcement learning, enhancing robustness and performance in nonstationary environments by limiting policy complexity.

Contribution

It introduces Capacity-Limited MADDPG, an information-theoretic constraint that improves policy robustness and learning efficiency in multi-agent settings.

Findings

01

Enhanced robustness to environment changes

02

Improved learning performance in cooperative tasks

03

Competitive tasks also benefit from the approach

Abstract

This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm. Previous research with a related approach in continuous control experiments suggests that this method favors learning policies that are more robust to changing environment dynamics. The multi-agent game setting naturally requires this type of robustness, as other agents' policies change throughout learning, introducing a nonstationary environment. For this reason, recent methods in continual learning are compared to our approach, termed Capacity-Limited MADDPG. Results from experimentation in multi-agent cooperative and competitive tasks demonstrate that the capacity-limited approach is a good candidate for improving learning performance in these environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adaptive Dynamic Programming Control

MethodsConvolution · Experience Replay · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Weight Decay · Dense Connections · MADDPG