A note on the policy iteration algorithm for discounted Markov decision processes for a class of semicontinuous models
\'Oscar Vega-Amaya, Fernando Luque-V\'asquez

TL;DR
This paper introduces a modified policy iteration algorithm for semicontinuous Markov decision processes, addressing measurability issues by incorporating a smoothing step, and proves its convergence under certain conditions.
Contribution
It proposes a new policy iteration method with smoothing for semicontinuous models, ensuring convergence and optimality properties.
Findings
The modified PI algorithm converges linearly to the optimal value function.
A smoothing step resolves measurability issues in semicontinuous models.
Existence of an improvement policy with continuous cost function is established.
Abstract
The standard version of the policy iteration (PI) algorithm fails for semicontinuous models, that is, for models with lower semicontinuous one-step costs and weakly continuous transition law. This is due to the lack of continuity properties of the discounted cost for stationary policies, thus appearing a measurability problem in the improvement step. The present work proposes an alternative version of PI algorithm which performs an smoothing step to avoid the measurability problem. Assuming that the model satisfies a Lyapunov growth conditions and also some standard continuity-compactness properties, it is shown the linear convergence of the policy iteration functions to the optimal value function. Strengthening the continuity conditions, in a second result, it is shown that among the improvement policies there is one with the best possible improvement and whose cost function is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization
