Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

Jiamin He; Samuel Neumann; Jincheng Mei; Adam White; Martha White

arXiv:2605.09157·cs.LG·May 12, 2026

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

Jiamin He, Samuel Neumann, Jincheng Mei, Adam White, Martha White

PDF

TL;DR

This paper investigates the practical benefits of mixture policies in entropy-regularized reinforcement learning, introduces a low-variance reparameterization estimator, and demonstrates its advantages through extensive experiments.

Contribution

It proposes the marginalized reparameterization (MRP) estimator for mixture policies, enabling lower variance and improved performance over standard methods.

Findings

01

MRP mixture policies outperform likelihood-ratio ones in experiments

02

MRP policies reach parity or surpass Gaussian policies in various benchmarks

03

The paper clarifies the trade-offs and practical advantages of mixture policies

Abstract

Mixture policies theoretically offer greater flexibility than unimodal policies in continuous action reinforcement learning, but the practical benefits of this complexity remain elusive. Mixture policies are notably absent from most state-of-the-art algorithms, raising a fundamental question: Is the added representational overhead useful? We show that increased flexibility can theoretically enhance solution quality and entropy robustness. Yet standard algorithms like SAC do not leverage these advantages. A core issue is the lack of a low-variance reparameterization trick for mixtures, a luxury Gaussian policies enjoy. We propose a marginalized reparameterization (MRP) estimator to address this, proving it offers lower variance than the standard likelihood-ratio (LR) approach. Our experiments across Gym MuJoCo, DeepMind Control Suite, and MetaWorld show that MRP mixture policies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.