Uniform-PAC Bounds for Reinforcement Learning with Linear Function   Approximation

Jiafan He; Dongruo Zhou; Quanquan Gu

arXiv:2106.11612·cs.LG·January 3, 2022·1 cites

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

Jiafan He, Dongruo Zhou, Quanquan Gu

PDF

Open Access 1 Video

TL;DR

This paper introduces FLUTE, a reinforcement learning algorithm with linear function approximation that guarantees uniform-PAC convergence to the optimal policy, providing the strongest theoretical guarantees in the field.

Contribution

The paper presents FLUTE, a novel RL algorithm with uniform-PAC guarantees, featuring a new minimax value estimator and a multi-level sample partition scheme.

Findings

01

Achieves uniform-PAC convergence with high probability

02

Implements a novel minimax value function estimator

03

Uses a multi-level partition scheme for sample selection

Abstract

We study reinforcement learning (RL) with linear function approximation. Existing algorithms for this problem only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees, which cannot guarantee the convergence to the optimal policy. In this paper, in order to overcome the limitation of existing algorithms, we propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability. The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation. At the core of our algorithm is a novel minimax value function estimator and a multi-level partition scheme to select the training samples from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms