Provably Efficient Infinite-Horizon Average-Reward Reinforcement   Learning with Linear Function Approximation

Woojin Chae; Dabeen Lee

arXiv:2409.10772·cs.LG·September 25, 2024

Provably Efficient Infinite-Horizon Average-Reward Reinforcement Learning with Linear Function Approximation

Woojin Chae, Dabeen Lee

PDF

Open Access

TL;DR

This paper introduces a computationally efficient algorithm for infinite-horizon average-reward linear MDPs that achieves near-optimal regret bounds, advancing reinforcement learning in continuous state spaces.

Contribution

It presents the first computationally tractable algorithm with provable regret bounds for linear MDPs under the Bellman optimality condition.

Findings

01

Achieves regret of ( d^{3/2} sp(v^*) \u00f7 T ) for linear MDPs.

02

Attains regret of ( d sp(v^*) T ) for linear mixture MDPs.

03

Uses novel techniques to control the covering number of value function class and span of optimistic estimators.

Abstract

This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear Markov decision processes (MDPs) and linear mixture MDPs under the Bellman optimality condition. While guaranteeing computational efficiency, our algorithm for linear MDPs achieves the best-known regret upper bound of $O (d^{3/2} sp (v^{*}) T)$ over $T$ time steps where $sp (v^{*})$ is the span of the optimal bias function $v^{*}$ and $d$ is the dimension of the feature mapping. For linear mixture MDPs, our algorithm attains a regret bound of $O (d \cdot sp (v^{*}) T)$ . The algorithm applies novel techniques to control the covering number of the value function class and the span of optimistic estimators of the value function, which is of independent interest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management · Autonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics