Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension
Ruosong Wang, Ruslan Salakhutdinov, Lin F. Yang

TL;DR
This paper introduces a provably efficient reinforcement learning algorithm that works with general value function approximation, achieving near-optimal regret bounds without assuming specific environment models.
Contribution
It develops a model-free RL algorithm with general function approximation, extending theoretical guarantees beyond linear models using eluder dimension and covering numbers.
Findings
Achieves regret bound of O(poly(dH) (T))
Generalizes recent linear approximation results to broader function classes
Provides a framework supporting practical RL algorithms
Abstract
Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL). Nevertheless, despite a handful of recent progress on developing theory for RL with linear function approximation, the understanding of general function approximation schemes largely remains missing. In this paper, we establish a provably efficient RL algorithm with general value function approximation. We show that if the value functions admit an approximation with a function class , our algorithm achieves a regret bound of where is a complexity measure of that depends on the eluder dimension [Russo and Van Roy, 2013] and log-covering numbers, is the planning horizon, and is the number interactions with the environment. Our theory generalizes recent progress on RL with linear value function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Auction Theory and Applications
