Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions
Gell\'ert Weisz, Philip Amortila, Csaba Szepesv\'ari

TL;DR
This paper proves that planning in MDPs with linearly-realizable optimal Q-functions requires exponential sample complexity in the worst case, highlighting fundamental limitations of current approaches.
Contribution
It establishes exponential lower bounds on the sample complexity for sound planning in linear MDPs, and analyzes the performance of least-squares value iteration.
Findings
Any sound planner must query at least exponential in the feature dimension or horizon.
Least-squares value iteration can compute a near-optimal policy with polynomial queries.
Exponential lower bounds highlight fundamental limits of planning algorithms in linear MDPs.
Abstract
We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only poly(H,d) queries regardless of the MDP, where H is the horizon and d is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least samples in the fized-horizon setting and samples in the discounted setting. We also show that for any , the least-squares value iteration algorithm with queries can compute a -optimal policy in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · AI-based Problem Solving and Planning
