TensorPlan and the Few Actions Lower Bound for Planning in MDPs under   Linear Realizability of Optimal Value Functions

Gell\'ert Weisz; Csaba Szepesv\'ari; Andr\'as Gy\"orgy

arXiv:2110.02195·cs.LG·March 11, 2022

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

Gell\'ert Weisz, Csaba Szepesv\'ari, Andr\'as Gy\"orgy

PDF

Open Access

TL;DR

This paper establishes exponential lower bounds on the query complexity for planning in MDPs with linear value function realizability, even when the number of actions is subexponential, highlighting fundamental limits of polynomial planning algorithms.

Contribution

It proves exponential lower bounds on query complexity for various linear realizability settings, resolving open questions about polynomial planning feasibility.

Findings

01

Exponential lower bounds hold for action set sizes as small as ( ext{min}(d^{1/4}, H^{1/2}))

02

TensorPlan's polynomial query complexity upper bound extends to new settings with deterministic transitions and stochastic rewards

03

Surprising exponential separation between lower bounds and previous polynomial upper bounds in certain cases

Abstract

We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either (i) the optimal value function $v^{⋆}$ or (ii) the optimal action-value function $q^{⋆}$ lie in the linear span of some features; or (iii) both $v^{⋆}$ and $q^{⋆}$ lie in the linear span when restricted to the states reachable from the starting state. Recently, Weisz et al. (2021b) showed that under (ii) the minimax query complexity of any planning algorithm is at least exponential in the horizon $H$ or in the feature dimension $d$ when the size $A$ of the action set can be chosen to be exponential in $min (d, H)$ . On the other hand, for the setting (i), Weisz et al. (2021a) introduced TensorPlan, a planner whose query cost is polynomial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Formal Methods in Verification