End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions
Zakaria Mhammedi, Alexander Rakhlin, Nneka Okolo

TL;DR
This paper introduces an efficient reinforcement learning algorithm for linear Bellman complete MDPs with deterministic transitions, capable of handling large action spaces and providing polynomial guarantees on sample and computational complexity.
Contribution
It presents the first end-to-end efficient RL algorithm for linear Bellman complete MDPs with deterministic transitions, applicable to large or infinite action spaces.
Findings
Algorithm learns ε-optimal policy efficiently
Polynomial sample and computational complexity
Handles large/infinite action spaces with argmax oracle
Abstract
We study reinforcement learning (RL) with linear function approximation in Markov Decision Processes (MDPs) satisfying \emph{linear Bellman completeness} -- a fundamental setting where the Bellman backup of any linear value function remains linear. While statistically tractable, prior computationally efficient algorithms are either limited to small action spaces or require strong oracle assumptions over the feature space. We provide a computationally efficient algorithm for linear Bellman complete MDPs with \emph{deterministic transitions}, stochastic initial states, and stochastic rewards. For finite action spaces, our algorithm is end-to-end efficient; for large or infinite action spaces, we require only a standard argmax oracle over actions. Our algorithm learns an -optimal policy with sample and computational complexity polynomial in the horizon, feature dimension, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Game Theory and Applications · Advanced Bandit Algorithms Research
