A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle
Ziniu Li, Tian Xu, Yang Yu

TL;DR
This paper analyzes the sample complexity of target Q-learning with a generative oracle in finite MDPs, correcting previous claims and showing that target networks do not increase sample complexity.
Contribution
It provides a tight analysis of target Q-learning's sample complexity, correcting prior misconceptions and comparing it to vanilla Q-learning.
Findings
Sample complexity is $ ilde{O}(| ext{S}|^2| ext{A}|^2 (1- ext{γ})^{-5} ext{ε}^{-2})$ in prior work.
Improved sample complexity to $ ilde{O}(| ext{S}|| ext{A}| (1- ext{γ})^{-5} ext{ε}^{-2})$ with sequential updates.
Target networks do not increase sample complexity compared to vanilla Q-learning.
Abstract
Q-learning with function approximation could diverge in the off-policy setting and the target network is a powerful technique to address this issue. In this manuscript, we examine the sample complexity of the associated target Q-learning algorithm in the tabular case with a generative oracle. We point out a misleading claim in [Lee and He, 2020] and establish a tight analysis. In particular, we demonstrate that the sample complexity of the target Q-learning algorithm in [Lee and He, 2020] is . Furthermore, we show that this sample complexity is improved to if we can sequentially update all state-action pairs and if is further in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Semiconductor materials and devices · Advancements in Semiconductor Devices and Circuit Design
MethodsQ-Learning
