Kernelized Reinforcement Learning with Order Optimal Regret Bounds
Sattar Vakili, Julia Olkhovskaya

TL;DR
This paper introduces $ ext{ extbackslash pi}$-KRVI, a kernelized reinforcement learning algorithm with order optimal regret bounds, improving theoretical guarantees for large state-action spaces using complex value function approximations.
Contribution
It presents the first order-optimal regret guarantees for kernel-based RL with general value functions, significantly improving over previous results especially with non-smooth kernels.
Findings
Achieves order optimal regret bounds in general settings.
Provides sublinear, near-minimal regret bounds for Matérn kernels.
Outperforms existing methods with polynomial improvements in episode dependence.
Abstract
Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose -KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by a reproducing kernel Hilbert space (RKHS). We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
MethodsFocus
