Is the Bellman residual a bad proxy?

Matthieu Geist; Bilal Piot; Olivier Pietquin

arXiv:1606.07636·cs.LG·December 13, 2017

Is the Bellman residual a bad proxy?

Matthieu Geist, Bilal Piot, Olivier Pietquin

PDF

Open Access

TL;DR

This paper compares the effectiveness of maximizing mean value versus minimizing the Bellman residual in reinforcement learning, finding that directly maximizing the mean value generally yields better policy optimization results.

Contribution

It provides a theoretical and empirical comparison showing that Bellman residual minimization is a poor proxy for policy optimization compared to maximizing mean value.

Findings

01

Bellman residual is generally a bad proxy for policy optimization.

02

Maximizing mean value outperforms residual minimization in experiments.

03

Residual minimization is less effective despite its popularity in value-based RL.

Abstract

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual $∥ T_{*} v_{π} - v_{π} ∥_{1, ν}$ over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research