Minimax Optimal Q Learning with Nearest Neighbors
Puning Zhao, Lifeng Lai

TL;DR
This paper introduces two new nearest neighbor Q-learning algorithms for continuous state MDPs, achieving improved sample complexity bounds that are minimax optimal and more efficient than previous methods.
Contribution
The paper proposes offline and online nearest neighbor Q-learning methods with significantly improved sample complexities and better resource utilization, extending applicability to unbounded state spaces.
Findings
Sample complexity for offline method: (rac{1}{\u03b5^{d+2}(1-\u03b3)^{d+2}})
Sample complexity for online method: (rac{1}{\u03b5^{d+2}(1-\u03b3)^{d+3}})
Methods are minimax optimal and more computationally efficient.
Abstract
Analyzing the Markov decision process (MDP) with continuous state spaces is generally challenging. A recent interesting work \cite{shah2018q} solves MDP with bounded continuous state space by a nearest neighbor learning approach, which has a sample complexity of for -accurate function estimation with discount factor . In this paper, we propose two new nearest neighbor learning methods, one for the offline setting and the other for the online setting. We show that the sample complexities of these two methods are and for offline and online methods respectively, which significantly improve over existing results and have minimax optimal dependence over . We achieve such improvement by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsFocus
