On Query-efficient Planning in MDPs under Linear Realizability of the   Optimal State-value Function

Gell\'ert Weisz; Philip Amortila; Barnab\'as Janzer; Yasin; Abbasi-Yadkori; Nan Jiang; Csaba Szepesv\'ari

arXiv:2102.02049·cs.LG·July 12, 2021·5 cites

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

Gell\'ert Weisz, Philip Amortila, Barnab\'as Janzer, Yasin, Abbasi-Yadkori, Nan Jiang, Csaba Szepesv\'ari

PDF

Open Access

TL;DR

This paper introduces the TensorPlan algorithm, which achieves polynomial query complexity for local planning in fixed-horizon MDPs under linear realizability of the optimal state-value function, assuming a small action set.

Contribution

It relaxes previous assumptions by only requiring linear realizability of a single policy's value function and provides the first polynomial-query algorithm under these conditions.

Findings

01

TensorPlan uses polynomial queries in (d, H, 1/δ) for near-optimal policies.

02

Linear realizability of a single value function suffices for polynomial planning complexity.

03

Exponential query lower bounds are established for infinite-horizon settings with many actions.

Abstract

We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map. The generative model provides a local access to the MDP: The planner can ask for random transitions from previously returned states and arbitrary actions, and features are only accessible for states that are encountered in this process. As opposed to previous work (e.g. Lattimore et al. (2020)) where linear realizability of all policies was assumed, we consider the significantly relaxed assumption of a single linearly realizable (deterministic) policy. A recent lower bound by Weisz et al. (2020) established that the related problem when the action-value function of the optimal policy is linearly realizable requires an exponential number of queries, either in $H$ (the horizon of the MDP) or $d$ (the dimension of the feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Formal Methods in Verification