Confident Approximate Policy Iteration for Efficient Local Planning in   $q^\pi$-realizable MDPs

Gell\'ert Weisz; Andr\'as Gy\"orgy; Tadashi Kozuno; Csaba; Szepesv\'ari

arXiv:2210.15755·cs.LG·October 31, 2022

Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs

Gell\'ert Weisz, Andr\'as Gy\"orgy, Tadashi Kozuno, Csaba, Szepesv\'ari

PDF

Open Access 1 Video

TL;DR

This paper introduces Confident Approximate Policy Iteration (CAPI), a new algorithm that improves error bounds and query complexity for local planning in MDPs with linear value-function approximation, while maintaining stationary policies.

Contribution

The paper proposes CAPI, a variant of API with linear error scaling and stationary policies, and applies it to local planning with simulator access and linear function approximation.

Findings

01

CAPI achieves an error bound scaling linearly with horizon and approximation error.

02

The planning algorithm outputs an approximately optimal policy with near-optimal query complexity.

03

The query complexity is tight in all parameters except the horizon.

Abstract

We consider approximate dynamic programming in $γ$ -discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $ϵ$ of the action-value functions of stationary policies. This improvement over API (whose error scales with $H^{2}$ ) comes at the price of an $H$ -fold increase in memory cost. Unlike Scherrer and Lesner [2012], who recommended computing a non-stationary policy to achieve a similar improvement (with the same memory overhead), we are able to stick to stationary policies. This allows for our second…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Optimization and Search Problems