Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization

Ziqi Wang; Jiashun Liu; Ling Pan

arXiv:2511.01374·cs.LG·November 4, 2025

Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization

Ziqi Wang, Jiashun Liu, Ling Pan

PDF

Open Access

TL;DR

This paper introduces a new approach for training multimodal policies in deep reinforcement learning using reparameterization and diversity regularization, enabling better decision diversity and robustness in complex tasks.

Contribution

It reformulates intractable multimodal actors within a unified framework and proposes a novel diversity regularization method that improves policy expressivity and performance.

Findings

01

Enhanced decision diversity in multimodal policies

02

Improved few-shot robustness in critical domains

03

Competitive performance on MuJoCo benchmarks

Abstract

Traditional continuous deep reinforcement learning (RL) algorithms employ deterministic or unimodal Gaussian actors, which cannot express complex multimodal decision distributions. This limitation can hinder their performance in diversity-critical scenarios. There have been some attempts to design online multimodal RL algorithms based on diffusion or amortized actors. However, these actors are intractable, making existing methods struggle with balancing performance, decision diversity, and efficiency simultaneously. To overcome this challenge, we first reformulate existing intractable multimodal actors within a unified framework, and prove that they can be directly optimized by policy gradient via reparameterization. Then, we propose a distance-based diversity regularization that does not explicitly require decision probabilities. We identify two diversity-critical domains, namely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning