Increasing the Action Gap: New Operators for Reinforcement Learning

Marc G. Bellemare; Georg Ostrovski; Arthur Guez; Philip S. Thomas and; R\'emi Munos

arXiv:1512.04860·cs.AI·December 16, 2015

Increasing the Action Gap: New Operators for Reinforcement Learning

Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas and, R\'emi Munos

PDF

2 Repos

TL;DR

This paper proposes new operators for Q-learning that increase the action gap to improve policy robustness, demonstrating their effectiveness through theoretical analysis and empirical results on Atari games.

Contribution

Introduction of a consistent Bellman operator that increases the action gap while preserving optimality, applicable to both tabular and continuous problems.

Findings

01

The consistent Bellman operator increases the action gap at each state.

02

Empirical results show superior performance on Atari 2600 games.

03

Theoretical conditions for operators to preserve optimality are established.

Abstract

This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.