VIME: Variational Information Maximizing Exploration

Rein Houthooft; Xi Chen; Yan Duan; John Schulman; Filip De Turck,; Pieter Abbeel

arXiv:1605.09674·cs.LG·January 30, 2017·376 cites

VIME: Variational Information Maximizing Exploration

Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck,, Pieter Abbeel

PDF

Open Access 2 Repos

TL;DR

VIME introduces a novel exploration strategy for deep reinforcement learning that maximizes information gain about environment dynamics, leading to improved performance in continuous control tasks with sparse rewards.

Contribution

The paper proposes VIME, a practical variational information maximization method for exploration in high-dimensional deep RL, adaptable to various algorithms and environments.

Findings

01

VIME outperforms heuristic exploration methods in continuous control tasks.

02

VIME effectively handles sparse reward scenarios.

03

The approach is compatible with multiple RL algorithms.

Abstract

Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Distributed and Parallel Computing Systems