Variational Dynamic for Self-Supervised Exploration in Deep   Reinforcement Learning

Chenjia Bai; Peng Liu; Kaiyu Liu; Lingxiao Wang; Yingnan Zhao; Lei Han

arXiv:2010.08755·cs.LG·April 3, 2024

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Chenjia Bai, Peng Liu, Kaiyu Liu, Lingxiao Wang, Yingnan Zhao, Lei Han

PDF

TL;DR

This paper introduces a variational dynamic model for self-supervised exploration in deep reinforcement learning, effectively handling multimodal and stochastic dynamics to improve exploration in sparse reward environments.

Contribution

It proposes a novel variational dynamic model using conditional variational inference to better model environment dynamics and derive intrinsic rewards for exploration.

Findings

01

Outperforms state-of-the-art environment model-based exploration methods.

02

Effective in both simulation and real robotic tasks.

03

Handles multimodal and stochastic environment dynamics.

Abstract

Efficient exploration remains a challenging problem in reinforcement learning, especially for tasks where extrinsic rewards from environments are sparse or even totally disregarded. Significant advances based on intrinsic motivation show promising results in simple environments but often get stuck in environments with multimodal and stochastic dynamics. In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality and stochasticity. We consider the environmental state-action transition as a conditional generative process by generating the next-state prediction under the condition of the current state, action, and latent variable, which provides a better understanding of the dynamics and leads a better performance in exploration. We derive an upper bound of the negative log-likelihood of the environmental transition and use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.