Optimistic Value Iteration
Arnd Hartmanns, Benjamin Lucien Kaminski

TL;DR
Optimistic value iteration is a new, efficient method for Markov decision process analysis that provides tight bounds on probabilities and rewards without complex precomputations.
Contribution
It introduces a simple, sound approach that combines standard value iteration with a novel guessing technique to obtain both lower and upper bounds.
Findings
Provides tight bounds on reachability probabilities and rewards
Easy to implement without extra precomputations
Demonstrates efficiency through extensive experiments
Abstract
Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides a lower bound on unbounded probabilities or reward values. Two "sound" variations, which also deliver an upper bound, have recently appeared. In this paper, we present optimistic value iteration, a new sound approach that leverages value iteration's ability to usually deliver tight lower bounds: we obtain a lower bound via standard value iteration, use the result to "guess" an upper bound, and prove the latter's correctness. Optimistic value iteration is easy to implement, does not require extra precomputations or a priori state space transformations, and works for computing reachability probabilities as well as expected rewards. It is also fast, as we show via an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
