Minimax Optimal Q Learning with Nearest Neighbors

Puning Zhao; Lifeng Lai

arXiv:2308.01490·cs.LG·June 18, 2024

Minimax Optimal Q Learning with Nearest Neighbors

Puning Zhao, Lifeng Lai

PDF

Open Access

TL;DR

This paper introduces two new nearest neighbor Q-learning algorithms for continuous state MDPs, achieving improved sample complexity bounds that are minimax optimal and more efficient than previous methods.

Contribution

The paper proposes offline and online nearest neighbor Q-learning methods with significantly improved sample complexities and better resource utilization, extending applicability to unbounded state spaces.

Findings

01

Sample complexity for offline method: (rac{1}{\u03b5^{d+2}(1-\u03b3)^{d+2}})

02

Sample complexity for online method: (rac{1}{\u03b5^{d+2}(1-\u03b3)^{d+3}})

03

Methods are minimax optimal and more computationally efficient.

Abstract

Analyzing the Markov decision process (MDP) with continuous state spaces is generally challenging. A recent interesting work \cite{shah2018q} solves MDP with bounded continuous state space by a nearest neighbor $Q$ learning approach, which has a sample complexity of $\tilde{O} (\frac{1}{ϵ ^{d + 3} ( 1 - γ ) ^{d + 7}})$ for $ϵ$ -accurate $Q$ function estimation with discount factor $γ$ . In this paper, we propose two new nearest neighbor $Q$ learning methods, one for the offline setting and the other for the online setting. We show that the sample complexities of these two methods are $\tilde{O} (\frac{1}{ϵ ^{d + 2} ( 1 - γ ) ^{d + 2}})$ and $\tilde{O} (\frac{1}{ϵ ^{d + 2} ( 1 - γ ) ^{d + 3}})$ for offline and online methods respectively, which significantly improve over existing results and have minimax optimal dependence over $ϵ$ . We achieve such improvement by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsFocus