Reinforcement Learning with Unsupervised Auxiliary Tasks

Max Jaderberg; Volodymyr Mnih; Wojciech Marian Czarnecki; Tom Schaul,; Joel Z Leibo; David Silver; Koray Kavukcuoglu

arXiv:1611.05397·cs.LG·November 17, 2016·272 cites

Reinforcement Learning with Unsupervised Auxiliary Tasks

Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul,, Joel Z Leibo, David Silver, Koray Kavukcuoglu

PDF

Open Access 3 Repos 2 Videos

TL;DR

This paper presents a reinforcement learning agent that maximizes auxiliary pseudo-reward tasks alongside the main reward, leading to faster learning and superior performance on Atari and 3D Labyrinth tasks.

Contribution

It introduces a method for simultaneous maximization of multiple pseudo-rewards and a mechanism to focus on extrinsic rewards, enhancing learning efficiency.

Findings

01

Achieved 880% of human performance on Atari games.

02

Realized a 10x speedup in learning on Labyrinth tasks.

03

Attained 87% of human performance on Labyrinth.

Abstract

Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that also maximises many other pseudo-reward functions simultaneously by reinforcement learning. All of these tasks share a common representation that, like unsupervised learning, continues to develop in the absence of extrinsic rewards. We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task. Our agent significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Reinforcement Learning with Unsupervised Auxiliary Tasks· youtube

Reinforcement Learning with sparse rewards· youtube

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques