High entropy leads to symmetry equivariant policies in Dec-POMDPs

Johannes Forkel; Constantin Ruhdorfer; Michael Beukman; Andreas Bulling; Jakob Foerster

arXiv:2511.22581·cs.LG·May 8, 2026

High entropy leads to symmetry equivariant policies in Dec-POMDPs

Johannes Forkel, Constantin Ruhdorfer, Michael Beukman, Andreas Bulling, Jakob Foerster

PDF

TL;DR

This paper proves that high entropy regularization in Dec-POMDPs guarantees convergence to symmetry-equivariant policies and demonstrates its practical impact on multi-agent training outcomes.

Contribution

It establishes a theoretical link between high entropy regularization and symmetry-equivariant policies in Dec-POMDPs, supported by empirical evaluations.

Findings

01

High entropy regularization ensures convergence to symmetry-equivariant policies.

02

Entropy coefficient significantly affects cross-play and self-play returns.

03

Increasing entropy during hyperparameter tuning improves policy robustness and performance.

Abstract

We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP. In particular, policies coming from different initializations will be fully compatible, in that their cross-play returns are equal to their self-play returns. Through extensive evaluation of independent PPO, arguably the standard baseline deep multi-agent policy gradient algorithm, in the Hanabi, Overcooked and Yokai environments, we find that the entropy coefficient has a massive influence on the cross-play returns between independently trained policies, and that the decrease in self-play returns coming from increased entropy regularization can often be counteracted by greedifying the learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.