On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes
Bruno Scherrer (INRIA Lorraine - LORIA)

TL;DR
This paper demonstrates that using non-stationary policies in infinite-horizon discounted Markov Decision Processes significantly improves performance bounds and simplifies the computation of near-optimal policies, especially when the discount factor is close to one.
Contribution
It introduces performance bounds for non-stationary policies derived from Value Iteration, showing they outperform stationary policies and simplify approximate policy computation.
Findings
Non-stationary policies reduce performance bounds by a factor related to the discount factor.
Using non-stationary policies simplifies the problem of computing approximately optimal policies.
Performance bounds for non-stationary policies are tighter when the discount factor is close to 1.
Abstract
We consider infinite-horizon -discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. We consider the algorithm Value Iteration and the sequence of policies it implicitely generates until some iteration . We provide performance bounds for non-stationary policies involving the last generated policies that reduce the state-of-the-art bound for the last stationary policy by a factor . In particular, the use of non-stationary policies allows to reduce the usual asymptotic performance bounds of Value Iteration with errors bounded by at each iteration from to , which is significant in the usual situation when is close to 1. Given Bellman operators that can only be computed with some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Algorithms
