VASE: Variational Assorted Surprise Exploration for Reinforcement   Learning

Haitao Xu; Brendan McCane; Lech Szymanski

arXiv:1910.14351·cs.LG·November 1, 2019

VASE: Variational Assorted Surprise Exploration for Reinforcement Learning

Haitao Xu, Brendan McCane, Lech Szymanski

PDF

TL;DR

VASE introduces a novel surprise-based exploration method using Bayesian neural networks and variational inference, significantly improving exploration efficiency in continuous control RL tasks with sparse rewards.

Contribution

It proposes a new definition of surprise and implements it with Bayesian neural networks trained via variational inference for enhanced exploration in RL.

Findings

01

VASE outperforms existing surprise-based methods in continuous control environments.

02

The approach effectively balances model accuracy and policy updates.

03

Experimental results demonstrate superior exploration efficiency with VASE.

Abstract

Exploration in environments with continuous control and sparse rewards remains a key challenge in reinforcement learning (RL). Recently, surprise has been used as an intrinsic reward that encourages systematic and efficient exploration. We introduce a new definition of surprise and its RL implementation named Variational Assorted Surprise Exploration (VASE). VASE uses a Bayesian neural network as a model of the environment dynamics and is trained using variational inference, alternately updating the accuracy of the agent's model and policy. Our experiments show that in continuous control sparse reward environments VASE outperforms other surprise-based exploration techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.