Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning
Aya Kayal, Sattar Vakili, Laura Toni, Alberto Bernacchia

TL;DR
This paper investigates the sample complexity needed for near-optimal policy design in reward-free kernel-based reinforcement learning, extending analysis to broader kernel classes and relaxing previous assumptions.
Contribution
It introduces a new analysis of sample complexity for kernel-based RL under reward-free settings, using a broader class of kernels and a simpler algorithm.
Findings
Derived new confidence intervals for kernel ridge regression in RL.
Established near-optimal sample complexity bounds under relaxed assumptions.
Validated theoretical results through simulations.
Abstract
Reinforcement Learning (RL) problems are being considered under increasingly more complex structures. While tabular and linear models have been thoroughly explored, the analytical study of RL under nonlinear function approximation, especially kernel-based models, has recently gained traction for their strong representational capacity and theoretical tractability. In this context, we examine the question of statistical efficiency in kernel-based RL within the reward-free RL framework, specifically asking: how many samples are required to design a near-optimal policy? Existing work addresses this question under restrictive assumptions about the class of kernel functions. We first explore this question by assuming a generative model, then relax this assumption at the cost of increasing the sample complexity by a factor of H, the length of the episode. We tackle this fundamental problem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
