The variance-penalized stochastic shortest path problem
Jakob Piribauer, Ocan Sankur, Christel Baier

TL;DR
This paper investigates optimizing a variance-penalized expectation in stochastic shortest path problems within Markov decision processes, providing complexity results and polynomial-time algorithms for certain subproblems.
Contribution
It introduces a novel variance-penalized optimization criterion for SSPP, analyzes its computational complexity, and presents algorithms for variance-minimal schedulers among expectation-optimal ones.
Findings
Optimal VPE can be computed in exponential space.
Threshold problem is EXPTIME-hard and in NEXPTIME.
Variance-minimal scheduler among expectation-optimal schedulers can be found in polynomial time.
Abstract
The stochastic shortest path problem (SSPP) asks to resolve the non-deterministic choices in a Markov decision process (MDP) such that the expected accumulated weight before reaching a target state is maximized. This paper addresses the optimization of the variance-penalized expectation (VPE) of the accumulated weight, which is a variant of the SSPP in which a multiple of the variance of accumulated weights is incurred as a penalty. It is shown that the optimal VPE in MDPs with non-negative weights as well as an optimal deterministic finite-memory scheduler can be computed in exponential space. The threshold problem whether the maximal VPE exceeds a given rational is shown to be EXPTIME-hard and to lie in NEXPTIME. Furthermore, a result of interest in its own right obtained on the way is that a variance-minimal scheduler among all expectation-optimal schedulers can be computed in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
