Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs
Gell\'ert Weisz, Andr\'as Gy\"orgy, Tadashi Kozuno, Csaba, Szepesv\'ari

TL;DR
This paper introduces Confident Approximate Policy Iteration (CAPI), a new algorithm that improves error bounds and query complexity for local planning in MDPs with linear value-function approximation, while maintaining stationary policies.
Contribution
The paper proposes CAPI, a variant of API with linear error scaling and stationary policies, and applies it to local planning with simulator access and linear function approximation.
Findings
CAPI achieves an error bound scaling linearly with horizon and approximation error.
The planning algorithm outputs an approximately optimal policy with near-optimal query complexity.
The query complexity is tight in all parameters except the horizon.
Abstract
We consider approximate dynamic programming in -discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon and the worst-case approximation error of the action-value functions of stationary policies. This improvement over API (whose error scales with ) comes at the price of an -fold increase in memory cost. Unlike Scherrer and Lesner [2012], who recommended computing a non-stationary policy to achieve a similar improvement (with the same memory overhead), we are able to stick to stationary policies. This allows for our second…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Optimization and Search Problems
