End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

Zakaria Mhammedi; Alexander Rakhlin; Nneka Okolo

arXiv:2603.23461·cs.LG·March 25, 2026

End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

Zakaria Mhammedi, Alexander Rakhlin, Nneka Okolo

PDF

Open Access

TL;DR

This paper introduces an efficient reinforcement learning algorithm for linear Bellman complete MDPs with deterministic transitions, capable of handling large action spaces and providing polynomial guarantees on sample and computational complexity.

Contribution

It presents the first end-to-end efficient RL algorithm for linear Bellman complete MDPs with deterministic transitions, applicable to large or infinite action spaces.

Findings

01

Algorithm learns ε-optimal policy efficiently

02

Polynomial sample and computational complexity

03

Handles large/infinite action spaces with argmax oracle

Abstract

We study reinforcement learning (RL) with linear function approximation in Markov Decision Processes (MDPs) satisfying \emph{linear Bellman completeness} -- a fundamental setting where the Bellman backup of any linear value function remains linear. While statistically tractable, prior computationally efficient algorithms are either limited to small action spaces or require strong oracle assumptions over the feature space. We provide a computationally efficient algorithm for linear Bellman complete MDPs with \emph{deterministic transitions}, stochastic initial states, and stochastic rewards. For finite action spaces, our algorithm is end-to-end efficient; for large or infinite action spaces, we require only a standard argmax oracle over actions. Our algorithm learns an $ε$ -optimal policy with sample and computational complexity polynomial in the horizon, feature dimension, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Game Theory and Applications · Advanced Bandit Algorithms Research