Reinforcement Learning with General Value Function Approximation:   Provably Efficient Approach via Bounded Eluder Dimension

Ruosong Wang; Ruslan Salakhutdinov; Lin F. Yang

arXiv:2005.10804·cs.LG·June 22, 2020·30 cites

Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

Ruosong Wang, Ruslan Salakhutdinov, Lin F. Yang

PDF

Open Access 1 Video

TL;DR

This paper introduces a provably efficient reinforcement learning algorithm that works with general value function approximation, achieving near-optimal regret bounds without assuming specific environment models.

Contribution

It develops a model-free RL algorithm with general function approximation, extending theoretical guarantees beyond linear models using eluder dimension and covering numbers.

Findings

01

Achieves regret bound of O(poly(dH) (T))

02

Generalizes recent linear approximation results to broader function classes

03

Provides a framework supporting practical RL algorithms

Abstract

Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL). Nevertheless, despite a handful of recent progress on developing theory for RL with linear function approximation, the understanding of general function approximation schemes largely remains missing. In this paper, we establish a provably efficient RL algorithm with general value function approximation. We show that if the value functions admit an approximation with a function class $F$ , our algorithm achieves a regret bound of $O (poly (d H) T)$ where $d$ is a complexity measure of $F$ that depends on the eluder dimension [Russo and Van Roy, 2013] and log-covering numbers, $H$ is the planning horizon, and $T$ is the number interactions with the environment. Our theory generalizes recent progress on RL with linear value function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Auction Theory and Applications