Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Sattar Vakili; Julia Olkhovskaya

arXiv:2306.07745·cs.LG·March 15, 2024·1 cites

Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Sattar Vakili, Julia Olkhovskaya

PDF

Open Access 1 Video

TL;DR

This paper introduces $ ext{ extbackslash pi}$-KRVI, a kernelized reinforcement learning algorithm with order optimal regret bounds, improving theoretical guarantees for large state-action spaces using complex value function approximations.

Contribution

It presents the first order-optimal regret guarantees for kernel-based RL with general value functions, significantly improving over previous results especially with non-smooth kernels.

Findings

01

Achieves order optimal regret bounds in general settings.

02

Provides sublinear, near-minimal regret bounds for Matérn kernels.

03

Outperforms existing methods with polynomial improvements in episode dependence.

Abstract

Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $π$ -KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by a reproducing kernel Hilbert space (RKHS). We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Kernelized Reinforcement Learning with Order Optimal Regret Bounds· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning

MethodsFocus