Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Haochen Zhang; Zhong Zheng; Lingzhou Xue

arXiv:2602.20297·stat.ML·February 25, 2026

Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Haochen Zhang, Zhong Zheng, Lingzhou Xue

PDF

Open Access

TL;DR

This paper provides the first gap-dependent regret bounds for nearly minimax-optimal reinforcement learning algorithms with linear function approximation, improving understanding of their performance and enabling efficient multi-agent exploration.

Contribution

It introduces a gap-dependent regret bound for the LSVI-UCB++ algorithm and a concurrent variant for multi-agent RL with linear function approximation.

Findings

01

First gap-dependent regret bound for LSVI-UCB++

02

Improved dependencies on feature dimension and horizon length

03

Linear speedup in multi-agent online RL

Abstract

We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly minimax-optimal worst-case regret bound $\tilde{O} (d H^{3} K)$ , where $d$ is the feature dimension, $H$ is the horizon length, and $K$ is the number of episodes. We bridge this gap by providing the first gap-dependent regret bound for the nearly minimax-optimal algorithm LSVI-UCB++ (He et al., 2023). Our analysis yields improved dependencies on both $d$ and $H$ compared to previous gap-dependent results. Moreover, leveraging the low policy-switching property of LSVI-UCB++, we introduce a concurrent variant that enables efficient parallel exploration across multiple agents and establish the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques