A Note on Target Q-learning For Solving Finite MDPs with A Generative   Oracle

Ziniu Li; Tian Xu; Yang Yu

arXiv:2203.11489·cs.LG·March 23, 2022

A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle

Ziniu Li, Tian Xu, Yang Yu

PDF

Open Access

TL;DR

This paper analyzes the sample complexity of target Q-learning with a generative oracle in finite MDPs, correcting previous claims and showing that target networks do not increase sample complexity.

Contribution

It provides a tight analysis of target Q-learning's sample complexity, correcting prior misconceptions and comparing it to vanilla Q-learning.

Findings

01

Sample complexity is $ ilde{O}(| ext{S}|^2| ext{A}|^2 (1- ext{γ})^{-5} ext{ε}^{-2})$ in prior work.

02

Improved sample complexity to $ ilde{O}(| ext{S}|| ext{A}| (1- ext{γ})^{-5} ext{ε}^{-2})$ with sequential updates.

03

Target networks do not increase sample complexity compared to vanilla Q-learning.

Abstract

Q-learning with function approximation could diverge in the off-policy setting and the target network is a powerful technique to address this issue. In this manuscript, we examine the sample complexity of the associated target Q-learning algorithm in the tabular case with a generative oracle. We point out a misleading claim in [Lee and He, 2020] and establish a tight analysis. In particular, we demonstrate that the sample complexity of the target Q-learning algorithm in [Lee and He, 2020] is $O (∣ S ∣^{2} ∣ A ∣^{2} (1 - γ)^{- 5} ε^{- 2})$ . Furthermore, we show that this sample complexity is improved to $O (∣ S ∣∣ A ∣ (1 - γ)^{- 5} ε^{- 2})$ if we can sequentially update all state-action pairs and $O (∣ S ∣∣ A ∣ (1 - γ)^{- 4} ε^{- 2})$ if $γ$ is further in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Semiconductor materials and devices · Advancements in Semiconductor Devices and Circuit Design

MethodsQ-Learning