Optimistic Value Iteration

Arnd Hartmanns; Benjamin Lucien Kaminski

arXiv:1910.01100·cs.LO·October 21, 2019

Optimistic Value Iteration

Arnd Hartmanns, Benjamin Lucien Kaminski

PDF

TL;DR

Optimistic value iteration is a new, efficient method for Markov decision process analysis that provides tight bounds on probabilities and rewards without complex precomputations.

Contribution

It introduces a simple, sound approach that combines standard value iteration with a novel guessing technique to obtain both lower and upper bounds.

Findings

01

Provides tight bounds on reachability probabilities and rewards

02

Easy to implement without extra precomputations

03

Demonstrates efficiency through extensive experiments

Abstract

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides a lower bound on unbounded probabilities or reward values. Two "sound" variations, which also deliver an upper bound, have recently appeared. In this paper, we present optimistic value iteration, a new sound approach that leverages value iteration's ability to usually deliver tight lower bounds: we obtain a lower bound via standard value iteration, use the result to "guess" an upper bound, and prove the latter's correctness. Optimistic value iteration is easy to implement, does not require extra precomputations or a priori state space transformations, and works for computing reachability probabilities as well as expected rewards. It is also fast, as we show via an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.