Sample Complexity of Kernel-Based Q-Learning

Sing-Yuan Yeh; Fu-Chieh Chang; Chang-Wei Yueh; Pei-Yuan Wu; Alberto; Bernacchia; Sattar Vakili

arXiv:2302.00727·cs.LG·February 3, 2023

Sample Complexity of Kernel-Based Q-Learning

Sing-Yuan Yeh, Fu-Chieh Chang, Chang-Wei Yueh, Pei-Yuan Wu, Alberto, Bernacchia, Sattar Vakili

PDF

Open Access

TL;DR

This paper establishes finite sample complexity bounds for kernel-based Q-learning in large-scale reinforcement learning with general Q-functions, using a nonparametric approach and assuming a generative model.

Contribution

It introduces a novel nonparametric Q-learning algorithm with order optimal sample complexity bounds for large state-action spaces under general kernel models.

Findings

01

Sample complexity is order optimal with respect to epsilon and kernel information gain.

02

First finite sample complexity result for kernel-based Q-learning in such general settings.

03

Algorithm finds an epsilon-optimal policy in large discounted MDPs.

Abstract

Modern reinforcement learning (RL) often faces an enormous state-action space. Existing analytical results are typically for settings with a small number of state-actions, or simple models such as linearly modeled Q-functions. To derive statistically efficient RL policies handling large state-action spaces, with more general Q-functions, some recent works have considered nonlinear function approximation using kernel ridge regression. In this work, we derive sample complexities for kernel based Q-learning when a generative model exists. We propose a nonparametric Q-learning algorithm which finds an $ϵ$ -optimal policy in an arbitrarily large scale discounted MDP. The sample complexity of the proposed algorithm is order optimal with respect to $ϵ$ and the complexity of the kernel (in terms of its information gain). To the best of our knowledge, this is the first result…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition

MethodsQ-Learning