Proper Value Equivalence

Christopher Grimm; Andr\'e Barreto; Gregory Farquhar; David Silver,; Satinder Singh

arXiv:2106.10316·cs.AI·December 14, 2021

Proper Value Equivalence

Christopher Grimm, Andr\'e Barreto, Gregory Farquhar, David Silver,, Satinder Singh

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the concept of proper value equivalence (PVE) in model-based RL, generalizing VE to order-$k$ and proposing a loss function for learning models that are sufficient for optimal planning, with practical improvements for MuZero.

Contribution

It generalizes the VE principle to order-$k$, defines PVE, and connects it to existing algorithms like MuZero, proposing modifications for better performance.

Findings

01

PVE models are sufficient for optimal planning despite ignoring many environment aspects.

02

A new loss function for learning PVE models is constructed.

03

Modified MuZero with PVE principles shows improved practical performance.

Abstract

One of the main challenges in model-based reinforcement learning (RL) is to decide which aspects of the environment should be modeled. The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning. Technically, VE distinguishes models based on a set of policies and a set of functions: a model is said to be VE to the environment if the Bellman operators it induces for the policies yield the correct result when applied to the functions. As the number of policies and functions increase, the set of VE models shrinks, eventually collapsing to a single point corresponding to a perfect model. A fundamental question underlying the VE principle is thus how to select the smallest sets of policies and functions that are sufficient for planning. In this paper we take an important step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chrisgrimm/proper_value_equivalence
jaxOfficial

Videos

Proper Value Equivalence· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Machine Learning and Algorithms

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · Prioritized Experience Replay · Residual Block · Convolution · Average Pooling · Monte-Carlo Tree Search · MuZero