Provably Efficient Reinforcement Learning with Linear Function   Approximation

Chi Jin; Zhuoran Yang; Zhaoran Wang; Michael I. Jordan

arXiv:1907.05388·cs.LG·August 9, 2019·219 cites

Provably Efficient Reinforcement Learning with Linear Function Approximation

Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan

PDF

Open Access 2 Repos

TL;DR

This paper introduces the first provably efficient reinforcement learning algorithm with polynomial runtime and sample complexity for linear function approximation, achieving regret bounds independent of state and action space sizes.

Contribution

It presents a novel optimistic LSVI algorithm with provable polynomial guarantees in linear RL settings, addressing key challenges in efficiency and exploration.

Findings

01

Achieves $ ilde{O}( oot{3}rom{d^3H^3T})$ regret bound.

02

Regret is independent of the number of states and actions.

03

First to provide polynomial guarantees without additional assumptions.

Abstract

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of function approximation raises a fundamental set of challenges involving computational and statistical efficiency, especially given the need to manage the exploration/exploitation tradeoff. As a result, a core RL question remains open: how can we design provably efficient RL algorithms that incorporate function approximation? This question persists even in a basic setting with linear dynamics and linear rewards, for which only linear function approximation is needed. This paper presents the first provable RL algorithm with both polynomial runtime and polynomial sample complexity in this linear setting, without requiring a "simulator" or additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management