ConQUR: Mitigating Delusional Bias in Deep Q-learning

Andy Su; Jayden Ooi; Tyler Lu; Dale Schuurmans; Craig Boutilier

arXiv:2002.12399·cs.LG·March 2, 2020·1 cites

ConQUR: Mitigating Delusional Bias in Deep Q-learning

Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces efficient methods to reduce delusional bias in deep Q-learning by training Q-approximators with consistent labels and using a multi-approximator search framework, leading to improved Atari game performance.

Contribution

It proposes a novel penalization scheme and a search framework to mitigate delusional bias without exhaustive search, enhancing deep Q-learning effectiveness.

Findings

01

Improved performance in Atari games.

02

Effective reduction of delusional bias.

03

Demonstrated benefits of multiple Q-approximators.

Abstract

Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BigHeaded2003/ConQUR-Mitigating-Delusional-Bias-in-Deep-Q-Learning
tf

Videos

ConQUR: Mitigating Delusional Bias in Deep Q-Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsQ-Learning