Sample Complexity of Kernel-Based Q-Learning
Sing-Yuan Yeh, Fu-Chieh Chang, Chang-Wei Yueh, Pei-Yuan Wu, Alberto, Bernacchia, Sattar Vakili

TL;DR
This paper establishes finite sample complexity bounds for kernel-based Q-learning in large-scale reinforcement learning with general Q-functions, using a nonparametric approach and assuming a generative model.
Contribution
It introduces a novel nonparametric Q-learning algorithm with order optimal sample complexity bounds for large state-action spaces under general kernel models.
Findings
Sample complexity is order optimal with respect to epsilon and kernel information gain.
First finite sample complexity result for kernel-based Q-learning in such general settings.
Algorithm finds an epsilon-optimal policy in large discounted MDPs.
Abstract
Modern reinforcement learning (RL) often faces an enormous state-action space. Existing analytical results are typically for settings with a small number of state-actions, or simple models such as linearly modeled Q-functions. To derive statistically efficient RL policies handling large state-action spaces, with more general Q-functions, some recent works have considered nonlinear function approximation using kernel ridge regression. In this work, we derive sample complexities for kernel based Q-learning when a generative model exists. We propose a nonparametric Q-learning algorithm which finds an -optimal policy in an arbitrarily large scale discounted MDP. The sample complexity of the proposed algorithm is order optimal with respect to and the complexity of the kernel (in terms of its information gain). To the best of our knowledge, this is the first result…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition
MethodsQ-Learning
