Approximate Modified Policy Iteration
Bruno Scherrer (INRIA Lorraine - LORIA), Victor Gabillon (INRIA Lille, - Nord Europe), Mohammad Ghavamzadeh (INRIA Lille - Nord Europe), Matthieu, Geist (UMI2958)

TL;DR
This paper introduces three implementations of approximate modified policy iteration (AMPI), extending existing algorithms, with error analyses and a finite-sample study showing how to balance estimation and approximation errors.
Contribution
It proposes novel AMPI algorithms based on well-known DP methods and provides comprehensive error and finite-sample analyses.
Findings
Error propagation analyses unify policy and value iteration.
Finite-sample analysis shows how to control estimation and approximation errors.
AMPI implementations extend fitted-value, fitted-Q, and classification-based policy iteration.
Abstract
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analyses that unify those for approximate policy and value iteration. On the last classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Formal Methods in Verification
