Linear Bellman Completeness Suffices for Efficient Online Reinforcement   Learning with Few Actions

Noah Golowich; Ankur Moitra

arXiv:2406.11640·cs.LG·June 19, 2024

Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions

Noah Golowich, Ankur Moitra

PDF

Open Access

TL;DR

This paper proves that a polynomial-time algorithm exists for online reinforcement learning with linear function approximation under Bellman completeness, provided the number of actions is constant, addressing a key computational challenge.

Contribution

It introduces the first efficient polynomial-time algorithm for RL with linear Bellman completeness when actions are limited, overcoming previous nonconvex optimization issues.

Findings

01

Algorithm runs in polynomial time for fixed number of actions

02

Addresses the open problem of computational efficiency under Bellman completeness

03

Extends theoretical understanding of RL with linear function approximation

Abstract

One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To ensure the success of value iteration, it is typically assumed that Bellman completeness holds, which ensures that these regression problems are well-specified. We study the problem of learning an optimal policy under Bellman completeness in the online model of RL with linear function approximation. In the linear setting, while statistically efficient algorithms are known under Bellman completeness (e.g., Jiang et al. (2017); Zanette et al. (2020)), these algorithms all rely on the principle of global optimism which requires solving a nonconvex optimization problem. In particular, it has remained open as to whether computationally efficient algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Auction Theory and Applications