Latent State Marginalization as a Low-cost Approach for Improving   Exploration

Dinghuai Zhang; Aaron Courville; Yoshua Bengio; Qinqing Zheng; Amy; Zhang; Ricky T. Q. Chen

arXiv:2210.00999·cs.LG·February 13, 2023·1 cites

Latent State Marginalization as a Low-cost Approach for Improving Exploration

Dinghuai Zhang, Aaron Courville, Yoshua Bengio, Qinqing Zheng, Amy, Zhang, Ricky T. Q. Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces SMAC, a low-cost latent state marginalization method within the MaxEnt RL framework, improving exploration and robustness in continuous control tasks through an actor-critic approach.

Contribution

It proposes a novel latent variable policy approach with marginalization techniques, enhancing exploration and training robustness in MaxEnt RL.

Findings

01

SMAC improves exploration in continuous control tasks.

02

Latent state marginalization enhances training robustness.

03

The method is simple, effective, and open-sourced.

Abstract

While the maximum entropy (MaxEnt) reinforcement learning (RL) framework -- often touted for its exploration and robustness capabilities -- is usually motivated from a probabilistic perspective, the use of deep probabilistic models has not gained much traction in practice due to their inherent complexity. In this work, we propose the adoption of latent variable policies within the MaxEnt framework, which we show can provably approximate any policy distribution, and additionally, naturally emerges under the use of world models with a latent belief state. We discuss why latent variable policies are difficult to train, how naive approaches can fail, then subsequently introduce a series of improvements centered around low-cost marginalization of the latent state, allowing us to make full use of the latent state at minimal additional cost. We instantiate our method under the actor-critic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zdhnarsil/stochastic-marginal-actor-critic
pytorchOfficial

Videos

Latent State Marginalization as a Low-cost Approach for Improving Exploration· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning