ConQUR: Mitigating Delusional Bias in Deep Q-learning
Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

TL;DR
This paper introduces efficient methods to reduce delusional bias in deep Q-learning by training Q-approximators with consistent labels and using a multi-approximator search framework, leading to improved Atari game performance.
Contribution
It proposes a novel penalization scheme and a search framework to mitigate delusional bias without exhaustive search, enhancing deep Q-learning effectiveness.
Findings
Improved performance in Atari games.
Effective reduction of delusional bias.
Demonstrated benefits of multiple Q-approximators.
Abstract
Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsQ-Learning
