Latent State Marginalization as a Low-cost Approach for Improving Exploration
Dinghuai Zhang, Aaron Courville, Yoshua Bengio, Qinqing Zheng, Amy, Zhang, Ricky T. Q. Chen

TL;DR
This paper introduces SMAC, a low-cost latent state marginalization method within the MaxEnt RL framework, improving exploration and robustness in continuous control tasks through an actor-critic approach.
Contribution
It proposes a novel latent variable policy approach with marginalization techniques, enhancing exploration and training robustness in MaxEnt RL.
Findings
SMAC improves exploration in continuous control tasks.
Latent state marginalization enhances training robustness.
The method is simple, effective, and open-sourced.
Abstract
While the maximum entropy (MaxEnt) reinforcement learning (RL) framework -- often touted for its exploration and robustness capabilities -- is usually motivated from a probabilistic perspective, the use of deep probabilistic models has not gained much traction in practice due to their inherent complexity. In this work, we propose the adoption of latent variable policies within the MaxEnt framework, which we show can provably approximate any policy distribution, and additionally, naturally emerges under the use of world models with a latent belief state. We discuss why latent variable policies are difficult to train, how naive approaches can fail, then subsequently introduce a series of improvements centered around low-cost marginalization of the latent state, allowing us to make full use of the latent state at minimal additional cost. We instantiate our method under the actor-critic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
