Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Qingfeng Lan; Yangchen Pan; Alona Fyshe; Martha White

arXiv:2002.06487·cs.LG·August 10, 2021·38 cites

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

PDF

Open Access 1 Repo

TL;DR

This paper introduces Maxmin Q-learning, a flexible algorithm that controls estimation bias in Q-learning, demonstrating improved bias management and performance across environments through theoretical analysis and empirical validation.

Contribution

It generalizes Q-learning with a bias-controlling parameter, providing theoretical guarantees and convergence proofs, and empirically shows improved bias control and performance.

Findings

01

Maxmin Q-learning can achieve unbiased estimates with lower variance.

02

The algorithm converges in tabular settings and improves performance on benchmarks.

03

Bias control effectiveness varies with environment complexity.

Abstract

Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value. Algorithms have been proposed to reduce overestimation bias, but we lack an understanding of how bias interacts with performance, and the extent to which existing algorithms mitigate bias. In this paper, we 1) highlight that the effect of overestimation bias on learning efficiency is environment-dependent; 2) propose a generalization of Q-learning, called \emph{Maxmin Q-learning}, which provides a parameter to flexibly control bias; 3) show theoretically that there exists a parameter choice for Maxmin Q-learning that leads to unbiased estimation with a lower approximation variance than Q-learning; and 4) prove the convergence of our algorithm in the tabular case, as well as convergence of several previous Q-learning variants, using a novel Generalized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qlan3/Explorer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsQ-Learning