Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization
Ziqi Wang, Jiashun Liu, Ling Pan

TL;DR
This paper introduces a new approach for training multimodal policies in deep reinforcement learning using reparameterization and diversity regularization, enabling better decision diversity and robustness in complex tasks.
Contribution
It reformulates intractable multimodal actors within a unified framework and proposes a novel diversity regularization method that improves policy expressivity and performance.
Findings
Enhanced decision diversity in multimodal policies
Improved few-shot robustness in critical domains
Competitive performance on MuJoCo benchmarks
Abstract
Traditional continuous deep reinforcement learning (RL) algorithms employ deterministic or unimodal Gaussian actors, which cannot express complex multimodal decision distributions. This limitation can hinder their performance in diversity-critical scenarios. There have been some attempts to design online multimodal RL algorithms based on diffusion or amortized actors. However, these actors are intractable, making existing methods struggle with balancing performance, decision diversity, and efficiency simultaneously. To overcome this challenge, we first reformulate existing intractable multimodal actors within a unified framework, and prove that they can be directly optimized by policy gradient via reparameterization. Then, we propose a distance-based diversity regularization that does not explicitly require decision probabilities. We identify two diversity-critical domains, namely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
