Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation
Jiafan He, Dongruo Zhou, Quanquan Gu

TL;DR
This paper introduces FLUTE, a reinforcement learning algorithm with linear function approximation that guarantees uniform-PAC convergence to the optimal policy, providing the strongest theoretical guarantees in the field.
Contribution
The paper presents FLUTE, a novel RL algorithm with uniform-PAC guarantees, featuring a new minimax value estimator and a multi-level sample partition scheme.
Findings
Achieves uniform-PAC convergence with high probability
Implements a novel minimax value function estimator
Uses a multi-level partition scheme for sample selection
Abstract
We study reinforcement learning (RL) with linear function approximation. Existing algorithms for this problem only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees, which cannot guarantee the convergence to the optimal policy. In this paper, in order to overcome the limitation of existing algorithms, we propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability. The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation. At the core of our algorithm is a novel minimax value function estimator and a multi-level partition scheme to select the training samples from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
