Provably Efficient Kernelized Q-Learning
Shuang Liu, Hao Su

TL;DR
This paper introduces a kernelized Q-learning algorithm with regret bounds for various kernels, demonstrating strong empirical performance on control tasks, especially with Gaussian RBF kernels, outperforming deep Q-learning in early training stages.
Contribution
It develops a theoretically grounded kernelized Q-learning method with regret bounds applicable to arbitrary kernels, including linear and Gaussian RBF, and validates its effectiveness empirically.
Findings
Regret bounds derived for linear and Gaussian RBF kernels.
Algorithm performs well on control tasks with early convergence.
Gaussian RBF kernel outperforms deep Q-learning in initial training phase.
Abstract
We propose and analyze a kernelized version of Q-learning. Although a kernel space is typically infinite-dimensional, extensive study has shown that generalization is only affected by the effective dimension of the data. We incorporate such ideas into the Q-learning framework and derive regret bounds for arbitrary kernels. In particular, we provide concrete bounds for linear kernels and Gaussian RBF kernels; notably, the latter bound looks almost identical to the former, only that the actual dimension is replaced by a different notion of dimensionality. Finally, we test our algorithm on a suite of classic control tasks; remarkably, under the Gaussian RBF kernel, it achieves reasonably good performance after only 1000 environmental steps, while its neural network counterpart, deep Q-learning, still struggles.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research
MethodsQ-Learning
