Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation
Haochen Zhang, Zhong Zheng, Lingzhou Xue

TL;DR
This paper provides the first gap-dependent regret bounds for nearly minimax-optimal reinforcement learning algorithms with linear function approximation, improving understanding of their performance and enabling efficient multi-agent exploration.
Contribution
It introduces a gap-dependent regret bound for the LSVI-UCB++ algorithm and a concurrent variant for multi-agent RL with linear function approximation.
Findings
First gap-dependent regret bound for LSVI-UCB++
Improved dependencies on feature dimension and horizon length
Linear speedup in multi-agent online RL
Abstract
We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly minimax-optimal worst-case regret bound , where is the feature dimension, is the horizon length, and is the number of episodes. We bridge this gap by providing the first gap-dependent regret bound for the nearly minimax-optimal algorithm LSVI-UCB++ (He et al., 2023). Our analysis yields improved dependencies on both and compared to previous gap-dependent results. Moreover, leveraging the low policy-switching property of LSVI-UCB++, we introduce a concurrent variant that enables efficient parallel exploration across multiple agents and establish the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
