Provably Efficient Kernelized Q-Learning

Shuang Liu; Hao Su

arXiv:2204.10349·cs.LG·April 25, 2022

Provably Efficient Kernelized Q-Learning

Shuang Liu, Hao Su

PDF

Open Access

TL;DR

This paper introduces a kernelized Q-learning algorithm with regret bounds for various kernels, demonstrating strong empirical performance on control tasks, especially with Gaussian RBF kernels, outperforming deep Q-learning in early training stages.

Contribution

It develops a theoretically grounded kernelized Q-learning method with regret bounds applicable to arbitrary kernels, including linear and Gaussian RBF, and validates its effectiveness empirically.

Findings

01

Regret bounds derived for linear and Gaussian RBF kernels.

02

Algorithm performs well on control tasks with early convergence.

03

Gaussian RBF kernel outperforms deep Q-learning in initial training phase.

Abstract

We propose and analyze a kernelized version of Q-learning. Although a kernel space is typically infinite-dimensional, extensive study has shown that generalization is only affected by the effective dimension of the data. We incorporate such ideas into the Q-learning framework and derive regret bounds for arbitrary kernels. In particular, we provide concrete bounds for linear kernels and Gaussian RBF kernels; notably, the latter bound looks almost identical to the former, only that the actual dimension is replaced by a different notion of dimensionality. Finally, we test our algorithm on a suite of classic control tasks; remarkably, under the Gaussian RBF kernel, it achieves reasonably good performance after only 1000 environmental steps, while its neural network counterpart, deep Q-learning, still struggles.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research

MethodsQ-Learning