Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning
Joshua Achiam, Shankar Sastry

TL;DR
This paper introduces a surprise-based intrinsic motivation approach for deep reinforcement learning, enabling agents to explore more effectively in complex, sparse reward environments by modeling transition probabilities and using surprise as an exploration incentive.
Contribution
The authors propose a novel intrinsic motivation method that uses transition model surprise to guide exploration, improving performance in high-dimensional, sparse reward tasks.
Findings
Agents successfully explore complex environments with sparse rewards.
The method outperforms heuristic exploration strategies in various tasks.
Approximations of surprise facilitate scalable exploration in deep RL.
Abstract
Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards. Recent successes in deep reinforcement learning have been achieved mostly using simple heuristic exploration strategies such as -greedy action selection or Gaussian control noise, but there are many tasks where these methods are insufficient to make any learning progress. Here, we consider more complex heuristics: efficient and scalable exploration strategies that maximize a notion of an agent's surprise about its experiences via intrinsic motivation. We propose to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model. One of our approximations results in using surprisal as intrinsic motivation, while the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
