Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

Joshua Achiam; Shankar Sastry

arXiv:1703.01732·cs.LG·March 7, 2017·100 cites

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

Joshua Achiam, Shankar Sastry

PDF

Open Access

TL;DR

This paper introduces a surprise-based intrinsic motivation approach for deep reinforcement learning, enabling agents to explore more effectively in complex, sparse reward environments by modeling transition probabilities and using surprise as an exploration incentive.

Contribution

The authors propose a novel intrinsic motivation method that uses transition model surprise to guide exploration, improving performance in high-dimensional, sparse reward tasks.

Findings

01

Agents successfully explore complex environments with sparse rewards.

02

The method outperforms heuristic exploration strategies in various tasks.

03

Approximations of surprise facilitate scalable exploration in deep RL.

Abstract

Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards. Recent successes in deep reinforcement learning have been achieved mostly using simple heuristic exploration strategies such as $ϵ$ -greedy action selection or Gaussian control noise, but there are many tasks where these methods are insufficient to make any learning progress. Here, we consider more complex heuristics: efficient and scalable exploration strategies that maximize a notion of an agent's surprise about its experiences via intrinsic motivation. We propose to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model. One of our approximations results in using surprisal as intrinsic motivation, while the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning