Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions
Noah Golowich, Ankur Moitra

TL;DR
This paper proves that a polynomial-time algorithm exists for online reinforcement learning with linear function approximation under Bellman completeness, provided the number of actions is constant, addressing a key computational challenge.
Contribution
It introduces the first efficient polynomial-time algorithm for RL with linear Bellman completeness when actions are limited, overcoming previous nonconvex optimization issues.
Findings
Algorithm runs in polynomial time for fixed number of actions
Addresses the open problem of computational efficiency under Bellman completeness
Extends theoretical understanding of RL with linear function approximation
Abstract
One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To ensure the success of value iteration, it is typically assumed that Bellman completeness holds, which ensures that these regression problems are well-specified. We study the problem of learning an optimal policy under Bellman completeness in the online model of RL with linear function approximation. In the linear setting, while statistically efficient algorithms are known under Bellman completeness (e.g., Jiang et al. (2017); Zanette et al. (2020)), these algorithms all rely on the principle of global optimism which requires solving a nonconvex optimization problem. In particular, it has remained open as to whether computationally efficient algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Auction Theory and Applications
