Improved Model-based Reinforcement Learning with Smooth Kernels

Kun Long; Yuqiang Li; Xianyi Wu

arXiv:2605.07218·cs.LG·May 11, 2026

Improved Model-based Reinforcement Learning with Smooth Kernels

Kun Long, Yuqiang Li, Xianyi Wu

PDF

TL;DR

This paper introduces a kernel-smoothing model-based reinforcement learning method that leverages MDP smoothness and Bernstein bonuses to improve regret bounds in finite-horizon settings.

Contribution

It develops a novel kernel-smoothing approach with Bernstein-style exploration bonuses, achieving improved regret bounds over existing methods.

Findings

01

Achieves better regret bounds with respect to the horizon.

02

Introduces a new Bernstein-type concentration inequality for martingales.

03

Demonstrates the effectiveness of smooth kernel methods in RL.

Abstract

For continuous state-action space scenarios, classical reinforcement learning (RL) theory predominantly focuses on low-rank Markov decision processes (MDPs), which provide sample-efficient guarantees at the expense of restrictive structural assumptions. Kernel smoothing model-based approaches offer a promising alternative paradigm that instead leverages the smoothness of the MDP and employs non-parametric kernel smoothing estimates of transition dynamics. This paper proposes a new kernel-smoothing model-based approach for online reinforcement learning in finite-horizon settings under Lipschitz continuity assumptions on the MDP. By incorporating a Bernstein-style exploration bonus into the kernel smoothing framework, our method achieves a regret bound which improves upon the state-of-the-art regret bound in its dependence on the horizon. The theoretical advancement relies on a delicate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.