A note on the policy iteration algorithm for discounted Markov decision   processes for a class of semicontinuous models

\'Oscar Vega-Amaya; Fernando Luque-V\'asquez

arXiv:2307.07038·math.OC·July 17, 2023

A note on the policy iteration algorithm for discounted Markov decision processes for a class of semicontinuous models

\'Oscar Vega-Amaya, Fernando Luque-V\'asquez

PDF

Open Access

TL;DR

This paper introduces a modified policy iteration algorithm for semicontinuous Markov decision processes, addressing measurability issues by incorporating a smoothing step, and proves its convergence under certain conditions.

Contribution

It proposes a new policy iteration method with smoothing for semicontinuous models, ensuring convergence and optimality properties.

Findings

01

The modified PI algorithm converges linearly to the optimal value function.

02

A smoothing step resolves measurability issues in semicontinuous models.

03

Existence of an improvement policy with continuous cost function is established.

Abstract

The standard version of the policy iteration (PI) algorithm fails for semicontinuous models, that is, for models with lower semicontinuous one-step costs and weakly continuous transition law. This is due to the lack of continuity properties of the discounted cost for stationary policies, thus appearing a measurability problem in the improvement step. The present work proposes an alternative version of PI algorithm which performs an smoothing step to avoid the measurability problem. Assuming that the model satisfies a Lyapunov growth conditions and also some standard continuity-compactness properties, it is shown the linear convergence of the policy iteration functions to the optimal value function. Strengthening the continuity conditions, in a second result, it is shown that among the improvement policies there is one with the best possible improvement and whose cost function is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization