On the Use of Non-Stationary Policies for Infinite-Horizon Discounted   Markov Decision Processes

Bruno Scherrer (INRIA Lorraine - LORIA)

arXiv:1203.5532·cs.AI·April 2, 2012

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

Bruno Scherrer (INRIA Lorraine - LORIA)

PDF

Open Access

TL;DR

This paper demonstrates that using non-stationary policies in infinite-horizon discounted Markov Decision Processes significantly improves performance bounds and simplifies the computation of near-optimal policies, especially when the discount factor is close to one.

Contribution

It introduces performance bounds for non-stationary policies derived from Value Iteration, showing they outperform stationary policies and simplify approximate policy computation.

Findings

01

Non-stationary policies reduce performance bounds by a factor related to the discount factor.

02

Using non-stationary policies simplifies the problem of computing approximately optimal policies.

03

Performance bounds for non-stationary policies are tighter when the discount factor is close to 1.

Abstract

We consider infinite-horizon $γ$ -discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. We consider the algorithm Value Iteration and the sequence of policies $π_{1}, ..., π_{k}$ it implicitely generates until some iteration $k$ . We provide performance bounds for non-stationary policies involving the last $m$ generated policies that reduce the state-of-the-art bound for the last stationary policy $π_{k}$ by a factor $\frac{1 - γ}{1 - γ ^{m}}$ . In particular, the use of non-stationary policies allows to reduce the usual asymptotic performance bounds of Value Iteration with errors bounded by $ϵ$ at each iteration from $\frac{γ}{( 1 - γ ) ^{2}} ϵ$ to $\frac{γ}{1 - γ} ϵ$ , which is significant in the usual situation when $γ$ is close to 1. Given Bellman operators that can only be computed with some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Algorithms