The Value-Improvement Path: Towards Better Representations for   Reinforcement Learning

Will Dabney; Andr\'e Barreto; Mark Rowland; Robert Dadashi; John Quan,; Marc G. Bellemare; David Silver

arXiv:2006.02243·cs.LG·January 5, 2021

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

Will Dabney, Andr\'e Barreto, Mark Rowland, Robert Dadashi, John Quan,, Marc G. Bellemare, David Silver

PDF

TL;DR

This paper proposes a holistic approach to value prediction in reinforcement learning by modeling the entire value-improvement path, leading to improved representations and significantly better performance in Atari games.

Contribution

It introduces the concept of the value-improvement path and demonstrates how learning this path holistically enhances representation learning in RL.

Findings

01

Augmented RL agent with value-improvement path auxiliary task doubles performance on Atari games.

02

Holistic value prediction improves the accuracy of value functions for future policy improvements.

03

Provides new insights into auxiliary tasks and representation learning in RL.

Abstract

In value-based reinforcement learning (RL), unlike in supervised learning, the agent faces not a single, stationary, approximation problem, but a sequence of value prediction problems. Each time the policy improves, the nature of the problem changes, shifting both the distribution of states and their values. In this paper we take a novel perspective, arguing that the value prediction problems faced by an RL agent should not be addressed in isolation, but rather as a single, holistic, prediction problem. An RL algorithm generates a sequence of policies that, at least approximately, improve towards the optimal policy. We explicitly characterize the associated sequence of value functions and call it the value-improvement path. Our main idea is to approximate the value-improvement path holistically, rather than to solely track the value function of the current policy. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.