Maxmin Q-learning: Controlling the Estimation Bias of Q-learning
Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

TL;DR
This paper introduces Maxmin Q-learning, a flexible algorithm that controls estimation bias in Q-learning, demonstrating improved bias management and performance across environments through theoretical analysis and empirical validation.
Contribution
It generalizes Q-learning with a bias-controlling parameter, providing theoretical guarantees and convergence proofs, and empirically shows improved bias control and performance.
Findings
Maxmin Q-learning can achieve unbiased estimates with lower variance.
The algorithm converges in tabular settings and improves performance on benchmarks.
Bias control effectiveness varies with environment complexity.
Abstract
Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value. Algorithms have been proposed to reduce overestimation bias, but we lack an understanding of how bias interacts with performance, and the extent to which existing algorithms mitigate bias. In this paper, we 1) highlight that the effect of overestimation bias on learning efficiency is environment-dependent; 2) propose a generalization of Q-learning, called \emph{Maxmin Q-learning}, which provides a parameter to flexibly control bias; 3) show theoretically that there exists a parameter choice for Maxmin Q-learning that leads to unbiased estimation with a lower approximation variance than Q-learning; and 4) prove the convergence of our algorithm in the tabular case, as well as convergence of several previous Q-learning variants, using a novel Generalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsQ-Learning
