Approximate Modified Policy Iteration

Bruno Scherrer (INRIA Lorraine - LORIA); Victor Gabillon (INRIA Lille; - Nord Europe); Mohammad Ghavamzadeh (INRIA Lille - Nord Europe); Matthieu; Geist (UMI2958)

arXiv:1205.3054·cs.AI·May 21, 2012·1 cites

Approximate Modified Policy Iteration

Bruno Scherrer (INRIA Lorraine - LORIA), Victor Gabillon (INRIA Lille, - Nord Europe), Mohammad Ghavamzadeh (INRIA Lille - Nord Europe), Matthieu, Geist (UMI2958)

PDF

Open Access

TL;DR

This paper introduces three implementations of approximate modified policy iteration (AMPI), extending existing algorithms, with error analyses and a finite-sample study showing how to balance estimation and approximation errors.

Contribution

It proposes novel AMPI algorithms based on well-known DP methods and provides comprehensive error and finite-sample analyses.

Findings

01

Error propagation analyses unify policy and value iteration.

02

Finite-sample analysis shows how to control estimation and approximation errors.

03

AMPI implementations extend fitted-value, fitted-Q, and classification-based policy iteration.

Abstract

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analyses that unify those for approximate policy and value iteration. On the last classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Formal Methods in Verification