Positivity-hardness results on Markov decision processes

Jakob Piribauer; Christel Baier

arXiv:2302.13675·cs.LO·August 7, 2024

Positivity-hardness results on Markov decision processes

Jakob Piribauer, Christel Baier

PDF

Open Access

TL;DR

This paper proves that many optimization problems in Markov decision processes are as hard as the longstanding open Positivity problem in number theory, indicating no efficient solutions are likely without major breakthroughs.

Contribution

It establishes Positivity-hardness for various MDP optimization problems, linking their decidability to a major open problem in number theory.

Findings

01

Optimization problems are Positivity-hard for MDPs.

02

Decidability of these problems is linked to the open Positivity problem.

03

No efficient algorithms are likely without breakthroughs in number theory.

Abstract

This paper investigates a series of optimization problems for one-counter Markov decision processes (MDPs) and integer-weighted MDPs with finite state space. Specifically, it considers problems addressing termination probabilities and expected termination times for one-counter MDPs, as well as satisfaction probabilities of energy objectives, conditional and partial expectations, satisfaction probabilities of constraints on the total accumulated weight, the computation of quantiles for the accumulated weight, and the conditional value-at-risk for accumulated weights for integer-weighted MDPs. Although algorithmic results are available for some special instances, the decidability status of the decision versions of these problems is unknown in general. The paper demonstrates that these optimization problems are inherently mathematically difficult by providing polynomial-time reductions…

Equations191

u_{n + k} = α_{1} u_{n + k - 1} + \dots + α_{k} u_{n}

u_{n + k} = α_{1} u_{n + k - 1} + \dots + α_{k} u_{n}

wgt (π) = wgt (s_{0}, α_{0}) + \dots + wgt (s_{k - 1}, α_{k - 1}) .

wgt (π) = wgt (s_{0}, α_{0}) + \dots + wgt (s_{k - 1}, α_{k - 1}) .

P (π) = P (s_{0}, α_{0}, s_{1}) \cdot \dots \cdot P (s_{k - 1}, α_{k - 1}, s_{k}) .

P (π) = P (s_{0}, α_{0}, s_{1}) \cdot \dots \cdot P (s_{k - 1}, α_{k - 1}, s_{k}) .

Pr_{M, s}^{S} (Cyl (π)) = P (π) \cdot Π_{i = 0}^{k - 1} S (s_{0} \dots s_{i}) (α_{i}) .

Pr_{M, s}^{S} (Cyl (π)) = P (π) \cdot Π_{i = 0}^{k - 1} S (s_{0} \dots s_{i}) (α_{i}) .

Pr_{M, s}^{m a x} (E)

Pr_{M, s}^{m a x} (E)

E_{M, s}^{m a x} (X)

V^{m a x} (s_{init}, 0) \leq V^{S} (s_{init}, 0)

V^{m a x} (s_{init}, 0) \leq V^{S} (s_{init}, 0)

V (s) = α \in Act (s) max wgt (s, α) + t \in S \sum P (s, α, t) \cdot V (t)

V (s) = α \in Act (s) max wgt (s, α) + t \in S \sum P (s, α, t) \cdot V (t)

V (s, w) = α \in Act (s) max t \in S \sum P (s, α, t) \cdot V (t, w + wgt (s, α)) .

V (s, w) = α \in Act (s) max t \in S \sum P (s, α, t) \cdot V (t, w + wgt (s, α)) .

u_{n + k} = α_{1} u_{n + k - 1} + \dots + α_{k} u_{n}

u_{n + k} = α_{1} u_{n + k - 1} + \dots + α_{k} u_{n}

v_{n + k} = λ \cdot α_{1} \cdot v_{n + k - 1} + λ^{2} \cdot α_{2} \cdot v_{n + k - 1} + \dots + λ^{k} \cdot α_{k} \cdot v_{n} .

v_{n + k} = λ \cdot α_{1} \cdot v_{n + k - 1} + λ^{2} \cdot α_{2} \cdot v_{n + k - 1} + \dots + λ^{k} \cdot α_{k} \cdot v_{n} .

α = def i = 1 \sum k ∣ α_{i} ∣ < \frac{1}{5 k + 5} and that 0 \leq j < k max β_{j} < min (\frac{1}{4 k ^{2 k + 2}}, \frac{α}{4}) .

α = def i = 1 \sum k ∣ α_{i} ∣ < \frac{1}{5 k + 5} and that 0 \leq j < k max β_{j} < min (\frac{1}{4 k ^{2 k + 2}}, \frac{α}{4}) .

V (t, w) - V (s, w) =

V (t, w) - V (s, w) =

1 \leq i \leq k, α_{i} \geq 0 \sum α_{i} V (t, w - i) - α_{i} V (s, w - i) +

1 \leq i \leq k, α_{i} < 0 \sum (- α_{i}) V (s, w - i) + (- α_{i}) V (t, w - i)

=

Pr_{M, s_{init}}^{m a x} (◊ (accumulated weight < 0)) > ϑ if and only if u_{n} < 0 for some n \geq 0.

Pr_{M, s_{init}}^{m a x} (◊ (accumulated weight < 0)) > ϑ if and only if u_{n} < 0 for some n \geq 0.

p (s, w) = def Pr_{M, s}^{m a x} (◊ accumulated weight < - w) .

p (s, w) = def Pr_{M, s}^{m a x} (◊ accumulated weight < - w) .

p (s, w) = α \in Act (s) max t \in S \sum P (s, α, t) \cdot p (t, w + wgt (s, α)) for all s \in S and w \geq 0.

p (s, w) = α \in Act (s) max t \in S \sum P (s, α, t) \cdot p (t, w + wgt (s, α)) for all s \in S and w \geq 0.

p (t, w) - p (s, w) = u_{w}

p (t, w) - p (s, w) = u_{w}

d (w) = def p (t, w) - p (s, w) .

d (w) = def p (t, w) - p (s, w) .

ϑ = def Pr_{M, s_{init}}^{S} (◊ (accumulated weight < 0)) .

ϑ = def Pr_{M, s_{init}}^{S} (◊ (accumulated weight < 0)) .

Pr_{M, s_{init}}^{m a x} (◊ (accumulated weight < 0)) \leq ϑ

Pr_{M, s_{init}}^{m a x} (◊ (accumulated weight < 0)) \leq ϑ

p (t, nk + i) = j = 1 \sum k

p (t, nk + i) = j = 1 \sum k

v_{n} = (p (t, nk + k - 1), p (t, nk + k - 2), \dots, p (t, nk), p (s, nk + k - 1), \dots, p (s, nk))^{⊤}

v_{n} = (p (t, nk + k - 1), p (t, nk + k - 2), \dots, p (t, nk), p (s, nk + k - 1), \dots, p (s, nk))^{⊤}

(p (t, k - 1), \dots, p (t, 0), \dots p (s, k - 1), \dots, p (s, 0))^{⊤}

(p (t, k - 1), \dots, p (t, 0), \dots p (s, k - 1), \dots, p (s, 0))^{⊤}

(\frac{1}{2 ^{k}})^{n} \cdot c \cdot v_{n} = i = 1 \sum k \frac{1}{2 ^{nk + i}} p (t, nk + i) .

(\frac{1}{2 ^{k}})^{n} \cdot c \cdot v_{n} = i = 1 \sum k \frac{1}{2 ^{nk + i}} p (t, nk + i) .

ϑ

ϑ

= c \cdot n = 0 \sum \infty (\frac{1}{2 ^{k}})^{n} \cdot A^{n} \cdot v_{0} - p (t, 0) = c \cdot (n = 0 \sum \infty (\frac{1}{2 ^{k}} \cdot A)^{n}) \cdot v_{0} - p (t, 0) .

ϑ = c \cdot (I_{2 k} - \frac{1}{2 ^{k}} A)^{- 1} \cdot v_{0} - p (t, 0)

ϑ = c \cdot (I_{2 k} - \frac{1}{2 ^{k}} A)^{- 1} \cdot v_{0} - p (t, 0)

Pr_{M, s_{init}}^{m a x} (◊ (accumulated weight < 0)) > ϑ

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification · Software Reliability and Analysis Research · Bayesian Modeling and Causal Inference

Full text

Positivity-hardness results on Markov decision processes

Jakob Piribauer1

Christel Baier1

(1Technische Universität Dresden)

Abstract

This paper investigates a series of optimization problems for one-counter Markov decision processes (MDPs) and integer-weighted MDPs with finite state space. Specifically, it considers problems addressing termination probabilities and expected termination times for one-counter MDPs, as well as satisfaction probabilities of energy objectives, conditional and partial expectations, satisfaction probabilities of constraints on the total accumulated weight, the computation of quantiles for the accumulated weight, and the conditional value-at-risk for accumulated weights for integer-weighted MDPs. Although algorithmic results are available for some special instances, the decidability status of the decision versions of these problems is unknown in general.

The paper demonstrates that these optimization problems are inherently mathematically difficult by providing polynomial-time reductions from the Positivity problem for linear recurrence sequences. This problem is a well-known number-theoretic problem whose decidability status has been open for decades and it is known that decidability of the Positivity problem would have far-reaching consequences in analytic number theory. So, the reductions presented in the paper show that an algorithmic solution to any of the investigated problems is not possible without a major breakthrough in analytic number theory.

The reductions rely on the construction of MDP-gadgets that encode the initial values and linear recurrence relations of linear recurrence sequences. These gadgets can flexibly be adjusted to prove the various Positivity-hardness results. Interestingly, the reductions can also be extended to demonstrate the Positivity-hardness of two problems that address the long-run behavior of a system, namely the model-checking problem of frequency-LTL and the optimization of the long-run average probability to satisfy a path property (long-run probability). Both of these problems have been studied before on special instances, but are open in general.

1 Introduction

When modelling and analyzing computer systems and their interactions with their environment, two qualitatively different kinds of uncertainty about the evolution of the system execution play a central role: non-determinism and probabilism. If a system is, for example, employed in an unknown environment or depends on user inputs or concurrent processes, modelling the system as non-deterministic accounts for all possible external influences, sequences of user inputs, or possible orders in which concurrent events take place. If transition probabilities between the states of a system, such as the failure probability of components or the probabilities in a probabilistic choice employed in a randomized algorithm, are known or can be estimated, it is appropriate to model this behavior as probabilistic. A pure worst- or best-case analysis is not very informative in such cases and the additional probabilistic information available should be put to use. Markov decision processes (MDPs) are a standard operational model combining non-deterministic and probabilistic behavior and are widely used in operations research, artificial intelligence, and verification among others.

In each state of an MDP, there is a non-deterministic choice from a set of actions. Each action specifies a probability distribution over the possible successor states according to which a transition is chosen randomly. Typical optimization problems on MDPs require to resolve the non-deterministic choices by specifying a scheduler such that a quantitative objective function is optimized. For example, the standard model-checking problem asks for the minimal or maximal probability that an execution satisfies a given linear-time property. Here, minimum and maximum range over all resolutions of the non-deterministic choices, i.e., over all schedulers. This model-checking problem is known to be 2EXPTIME-complete if the property is given in linear temporal logic (LTL) [CY95] and solvable in polynomial time if the property is given by a deterministic automaton [dA99, BK08]. Many quantitative aspects of a system can be modeled by equipping an MDP with weights that are collected in each step. These weights might represent time, energy consumption, utilities, or generally speaking any sort of costs or rewards incurred. Classical optimization problems in this context that are known to be solvable in polynomial time include the optimization of the expected value of the total accumulated weight before a target state is reached, the so-called stochastic shortest path problem (SSPP) [BT91, dA99, BBD*+*18], the expected value of the reward earned in average per step, the so-called expected mean payoff or long-run average, or the expected discounted accumulated weight where after each step a discount factor is applied to all future weights (for the latter two, see, e.g., [HK79, Put94]).

Of course, there is a vast landscape of further optimization problems on finite-state MDPs that have been analyzed. We are, nevertheless, not aware of natural decision problems for standard (finite-state) MDPs with a single weight function and single objective that are known to be undecidable. Undecidability results have been established for more expressive models. This applies, e.g., to recursive MDPs [EY05], MDPs with two or more weight functions [BKKW14, RRS17], or partially observable MDPs [MHC99, BGB12].

In this paper, we will investigate a series of optimization problems that have been studied in the literature, but are open in general. We will show that these problems possess an inherent mathematical difficulty that makes algorithmic solutions without a major breakthrough in analytic number theory impossible. Formally, this result is obtained by reductions from the Positivity problem for linear recurrence sequences, a number theoretic problem whose decidability status has been open for many decades.

1.1 Positivity problem

Definition 1.1 (Positivity problem).

The Positivity problem for linear recurrence sequences asks whether such a sequence stays non-negative. More formally, given a natural number $k\geq 2$ , and rationals $\alpha_{i}$ and $\beta_{j}$ with $1\leq i\leq k$ and $0\leq j\leq k-1$ , let $(u_{n})_{n\geq 0}$ be defined by the initial values $u_{0}=\beta_{0}$ , …, $u_{k-1}=\beta_{k-1}$ and the linear recurrence relation

[TABLE]

for all $n\geq 0$ . The Positivity problem asks to decide whether $u_{n}\geq 0$ for all $n$ .111We do not distinguish between the Positivity problem and its complement in the sequel. So, we also refer to the problem whether there is an $n$ such that $u_{n}<0$ as the Positivity problem. The number $k$ is called the order of the linear recurrence sequence.

The Positivity problem is closely related to the famous Skolem problem. The Skolem problem asks whether there is an $n$ such that $u_{n}=0$ for a given linear recurrence sequence $(u_{n})_{n\geq 0}$ . It is well-known that the Skolem problem is polynomial-time reducible to the Positivity problem (see, e.g., [EvdPSW03]). The Positivity problem and the Skolem problem are outstanding problems in the fields of number theory and theoretical computer science (see, e.g., [HHHK05, OW12, OW15]) and their decidability has been open for many decades. Deep results establish decidability for both problems for linear recurrence sequences of low order or for restricted classes of sequences [STM84, Ver85, OW14a, OW14b, OW14c]. A proof of decidability or undecidability of the Positivity problem for arbitrary sequences, however, withstands all known number-theoretic techniques. In [OW14b], it is shown that decidability of the Positivity problem (already for linear recurrence sequences of order $6$ ) would entail a major breakthrough in the field of Diophantine approximation of transcendental numbers, an area of analytic number theory.

We call a problem to which the Positivity problem is reducible Positivity-hard. From a complexity theoretic point of view, the Positivity problem is known to be at least as hard as the decision problem for the universal theory of the reals [OW14c], a problem known to be coNP-hard and to lie in PSPACE [Can88]. As most of the problems we will address are PSPACE-hard, the reductions in this paper do not provide new lower bounds on the computational complexity. The hardness results in this paper hence refer to the far-reaching consequences on major open problems that a decidability result would imply. Furthermore, of course the undecidability of the Positivity problem would entail the undecidability of any Positivity-hard problem.

1.2 Problems under investigation and related work on these problems

In the sequel, we briefly describe the problems studied in this paper and describe related work on these problems. In general, the decidability status of all of these problems is open and we will prove them to be Positivity-hard.

Energy objectives, one-counter MDPs, and quantiles.

If weights model a resource like energy that can be consumed and gained during a system execution, a natural problem is to determine the worst- or best-case probability that the system never runs out of the resource. This is known as the energy objective. There has been work on combinations of the energy objective with further objectives such as parity objectives [CD11, MSTW17] and expected mean payoffs [BKN16]. Previous work on this objective focused on the possibility to satisfy the objective (or the combination of objectives) almost surely. The quantitative problem whether it is possible to satisfy an energy objective with probability greater than some threshold $p$ is open.

The complement of the energy objective can be found in the context of one-counter MDPs (see [BBE*+*10, BBEK11, BKNW12]): Equipping an MDP with a counter that can be increased and decreased can be used to model a simple form of recursion and a can be seen as a special case of pushdown MDPs. The process is said to terminate as soon as the counter value drops below [math] and the standard task is to compute maximal or minimal termination probabilities. In one-counter MDPs that terminate almost surely, one furthermore can ask for the extremal expected termination times, i.e. expected number of steps until termination. While it is decidable for one-counter MDPs whether the maximal termination probability is $1$ in polynomial time and in exponential time if termination is required to occur inside a specified set of states [BBE*+*10], the computation of the optimal value and the quantitative decision problem whether the optimal value exceeds a threshold $p$ are left open in the literature. Also the problem to compute the minimal or maximal expected termination time of a one-counter MDP that terminates almost surely under any scheduler is open. There are, however, approximation algorithms for the optimal termination probability [BBEK11] and for the expected termination time of almost surely terminating one-counter MDPs [BKNW12]. One-counter MDPs can be seen as a special case of recursive MDPs [EY15]. For general recursive MDPs, the qualitative decision problem whether the maximal termination probability is $1$ is undecidable while for restricted forms, so-called 1-exit recursive MDPs, the qualitative and also the quantitative problem is decidable in polynomial space [EY15]. One-counter MDPs can be seen as a special case of 1-box recursive MDPs in the terminology of [EY15], a restriction orthogonal to 1-exit recursive MDPs.

The termination probability of one-counter MDPs and the satisfaction probability of the energy objective are closely related to the computation of quantiles (see [UB13, BDD*+*14, RRS17]). Given a probability value $p$ , here the task is to compute the best bound $b$ such that the maximal or minimal probability that the accumulated weight exceeds the bound is at most or at least $p$ . The decision version whether the maximal or minimal probability that the accumulated weight before reaching a target state exceeds $b$ is at least or at most $p$ is also known as the cost problem (see [HK15, HKL17, BBD*+*18]). The computation of quantiles and the cost problem have been addressed for MDPs with non-negative weights and are solvable in exponential time in this setting [UB13, HK15]. The decision version of the cost problem with non-negative weights is furthermore PSPACE-hard for a single inequality on the accumulated weight and EXPTIME-complete if a Boolean combination of inequality constraints on the accumulated weight is considered [HK15]. For the setting with arbitrary weights, [BBD*+*18] provides solutions to the qualitative question whether a constraint on the accumulated weight is satisfied with probability $1$ (or $>0$ ). Further, it is known that the quantitative problem is undecidable if multiple objectives with multiple weight functions have to be satisfied simultaneously [RRS17].

Non-classical stochastic shortest path problems (SSPPs).

The classical SSPP described above requires that a goal state is reached almost surely. In many situations, however, there might be no schedulers reaching the target with probability $1$ or schedulers that miss the target with positive probability are of interest, too. Two non-classical variants that drop this requirement are the conditional SSPP (see [BKKW17, PB19]) and the partial SSPP (see [CFK*+*13a, CFK*+*13b]). In the conditional SSPP, the goal is to optimize the conditional expected accumulated weight before reaching the target under the condition that the target is reached. In other words, the average weight of all paths reaching the target has to be optimized. In the partial SSPP, paths not reaching the target are not ignored, but assigned weight [math]. Possible applications for these non-classical SSPPs include the analysis of probabilistic programs where no guarantees on almost sure termination can be given (see, e.g., [GKM14, KGJ*+*15, BEFH16, CFG16, OGJ*+*18]), the analysis of fault-tolerant systems where error scenarios might occur with small, but positive probability, or the trade-off analysis with conjunctions of utility and cost constraints that are achievable with positive probability, but not almost surely (see, e.g., [BDK*+*14]). In [CFK*+*13a] and [BKKW17], partial and conditional expectations, respectively, have been addressed in the setting of non-negative weights. In both-cases, the optimal value can be computed in exponential time [CFK*+*13a, BKKW17] while the threshold problem is PSPACE-hard [PB19, BKKW17]. In MDPs with positive and negative weights, it is known that the optimal values might be irrational and that optimal schedulers might require infinite memory [PB19].

Conditional expectations also play an important role for some risk measures. The conditional value-at-risk (CVaR) is an established risk measure (see, e.g., [Ury00, AT02]) defined as the conditional expected outcome under the condition that the outcome belongs to the $p$ worst outcomes for a given probability value $p$ . In the context of optimization problems on weighted MDPs, the CVaR has been studied for mean-payoffs and weighted reachability where only one terminal weight is collected per run (see [KM18]), and for the accumulated weight before reaching a target state in MDPs with non-negative weights (see [ADBA21]). The CVaR for accumulated weights can be optimized in MDPs with non-negative weights in exponential time [PB20, Meg22].

Long-run properties and frequency-LTL over MDPs.

Besides encoding quantitative features of a model into a weight-structure, a further branch of research addresses ways to quantify the degree to which a specification is satisfied by a model (see [Hen13] for an overview of the field). One line of research in this direction attempts to measure the degree to which a specification is satisfied when evolving over time. This includes the work on frequency-LTL [BDL12, FK15, FKK15]. In frequency-LTL, temporal operators are relaxed by frequency constraints. Under a relaxed ‘globally’-operator with a rational lower frequency bound $q$ , a formula does not have to hold on all suffixes, but the frequency of suffixes that satisfy a formula has to be at least $q$ . Similarly, long-run probabilities (see [BBPS19]) capture the average probability that a system will satisfy a property when we start to observe it after many steps. Frequency-LTL and long-run probabilities can be useful for the analysis of the properties of systems in the long-run equilibrium after some initialization phase. This is helpful, e.g., to quantify the availability of system components. For a case study in this direction employing probabilistic model checking to analyze system availability, see [LPM*+*15].

We address the model-checking problem of frequency-LTL in MDPs and the optimization of long-run probabilities of simple co-safety properties in this paper. In [BBPS19], it is shown that for several types of properties including Street and Rabin conditions, the long-run probability in MDPs can be optimized in polynomial time. For constrained reachability ( $a{\mathrm{U}}b$ ), however, the threshold problem is shown to be NP-hard while the optimal value can be computed in exponential time. The model-checking problem for the full logic frequency-LTL on MDPs is open, but fragments for which the model-checking problem on MDPs is decidable have been identified [FK15, FKK15]. In particular, the model-checking problem is decidable if the until operator is not allowed in the scope of (frequency-)globally-operators [FKK15].

1.3 Contribution

We develop a technique to provide reductions from the Positivity problem to threshold problems on MDPs, asking whether the optimal value of a quantity exceeds a given rational threshold. The resulting reductions are based on the construction of MDP-gadgets that allow to encode the linear recurrence relation of a linear recurrence sequence and the initial values, respectively. The approach turns out to be quite flexible. By adjusting the gadgets encoding initial values, we can provide reductions of the same overall structure for several of the optimization problems we discussed. Through further chains of reductions depicted in Figure 1, we establish Positivity-hardness for the full series of optimization problems under investigation. The main result of this paper consequently is the following:

Main result.

The Positivity problem is polynomial-time reducible to the threshold problems for the optimal values of the following quantities:

•

termination probabilities of one-counter MDPs,

•

expected termination times of almost surely terminating one-counter MDPs,

•

the satisfaction probabilities of energy objectives in MDPs with weights in $\mathbb{Z}$ ,

•

the probability to satisfy an inequality on the accumulated weight (cost problem) in MDPs with weights in $\mathbb{Z}$ ,

•

conditional expectations (conditional SSPP) in MDPs with weights in $\mathbb{Z}$ ,

•

partial expectations (partial SSPP) in MDPs with weights in $\mathbb{Z}$ ,

•

conditional values-at-risk for accumulated weights (before reaching a goal) in MDPs with weights in $\mathbb{Z}$ ,

•

a two-sided version of partial expectations in MDPs with two non-negative weight functions with values in $\mathbb{N}$ , and

•

long-run probabilities of regular co-safety properties in MDPs.

Furthermore, an algorithm for

•

the computation of quantiles for accumulated weights in MDPs with weights in $\mathbb{Z}$ , or

•

the model-checking problem of frequency-LTL (as defined in [FK15, FKK15]) on MDPs

would imply the decidability of the Positivity-problem.

1.4 Related work on Skolem- and Positivity-hardness in verification

In [AAOW15], the Skolem-hardness of decision problems for Markov chains has been established. The problems studied in [AAOW15] are (1) to decide whether for given states $s$ , $t$ and rational number $p$ , there is a positive integer $n$ such that the probability to reach $t$ from $s$ in $n$ steps equals $p$ and (2) the model checking problem for a probabilistic variant of monadic logic and a variant of LTL that treats Markov chains as linear transformers of probability distributions. A connection between similar problems and the Skolem problem has also been conjectured in [BRS06, AAGT15]. These decision problems are of quite different nature than the problems studied here, and so are the reductions from the Skolem problem, because the behavior of a Markov chains in $n$ steps can directly be expressed by $P^{n}$ where $P$ is the transition probability matrix due to the lack of non-determinism.

In this context also the results of [COW16] and [MSS20] are remarkable as they show the decidability, subject to Schanuel’s conjecture, of reachability problems in continuous linear dynamical systems and continuous-time MDPs, respectively, as instances of the continuous Skolem problem. In other areas of formal verification, the Skolem problem and the Positivity problem play an important role in the context of the termination of linear programs [BAGM12, Tiw04, Bra06, OW15].

The Positivity-hardness results leave the possibility open that the problems under consideration are undecidable. Remarkable undecidability results in this context are presented in [KK15]: The hardness of deciding almost sure termination and almost sure termination with finite expected termination time for purely probabilistic programs formulated in the probabilistic fragment of probabilistic guarded command language (pGCL) [MMM05] is pinpointed to levels of the arithmetical hierarchy (for details on the arithmetical hierarchy, see, e.g., [Odi92]). The results reach up to $\Pi^{0}_{3}$ -completeness for deciding universal almost sure termination with finite expected termination time ( $\Pi_{1}^{0}$ -complete problems are already undecidable while still co-recursively enumerable). Undecidability is not surprising as the programs subsume ordinary programs. But the universal halting problem for ordinary programs is only $\Pi_{2}^{0}$ -complete showing that deciding universal termination with finite expected termination time of probabilistic programs is strictly harder. Similarly deciding termination from a given initial configuration is $\Sigma_{1}^{0}$ -complete for ordinary programs (halting problem) while deciding almost sure termination with finite expected termination time for probabilistic programs from a given initial configuration is $\Sigma_{2}^{0}$ -complete. Operational semantics of pGCL-programs can be given as infinite-state MDPs [GKM14]. Applied to the purely probabilistic fragment, this leads to infinite-state Markov chains.

1.5 Outline

In the following Section 2, we provide necessary definitions and present our notation. In Section 3, we outline the general structure of the gadget-based reductions from the Positivity-problem and construct an MDP-gadget in which a linear recurrence relation can be encoded in terms of the optimal values for a variety of optimization problems (Section 3.2). Afterwards, we construct gadgets encoding also the initial values of a linear recurrence sequence and provide the reductions from the Positivity problems and all subsequent reductions as depicted in Figure 1 (Section 4). We conclude with final remarks and an outlook on future work (Section 5).

1.6 Note on the publication status of the results

This paper is an extension of work published at ICALP 2020 [PB20]. It extends the conference version by the results for one-counter MDPs, energy objectives, quantiles, and cost problems. These additional results are also presented in the PhD thesis [Pir21]. Furthermore, full proofs omitted in the conference version and a detailed and improved description of all constructions is provided.

2 Preliminaries

We assume some familiarity with Markov decision processes and briefly introduce our notation in the sequel. More details can be found in text books such as [Put94].

Markov decision process.

A Markov decision process (MDP) is a tuple $\mathcal{M}=(S,\mathit{Act},P,s_{\mathit{\scriptscriptstyle init}})$ where $S$ is a finite set of states, $\mathit{Act}$ is a finite set of actions, $P\colon S\times\mathit{Act}\times S\to[0,1]\cap\mathbb{Q}$ is the transition probability function for which we require that $\sum_{t\in S}P(s,\alpha,t)\in\{0,1\}$ for all $(s,\alpha)\in S\times\mathit{Act}$ , and $s_{\mathit{\scriptscriptstyle init}}\in S$ is the initial state. Depending on the context, we enrich MDPs with a weight function $\mathit{wgt}\colon S\times\mathit{Act}\to\mathbb{Z}$ , a finite set of atomic propositions $\mathsf{AP}$ and a labeling function $L\colon S\to 2^{\mathsf{AP}}$ , or a designated set of goal states $\mathit{Goal}$ . The size of an MDP $\mathcal{M}$ , denoted by $\mathit{size}(\mathcal{M})$ , is the sum of the number of states plus the total sum of the logarithmic lengths of the non-zero probability values $P(s,\alpha,s^{\prime})$ as fractions of co-prime integers and, if present, the logarithmic lengths of the weight values $\mathit{wgt}(s,\alpha)$ .

We write $\mathit{Act}(s)$ for the set of actions that are enabled in a state $s$ , i.e., $\alpha\in\mathit{Act}(s)$ iff $\sum_{t\in S}P(s,\alpha,t)=1$ . Whenever the process is in a state $s$ , a non-deterministic choice between the enabled actions $\mathit{Act}(s)$ has to be made. We call a state absorbing if the only enabled actions lead to the state itself with probability $1$ and weight [math]. If there are no enabled actions, we call a state a trap. The paths of $\mathcal{M}$ are finite or infinite sequences $s_{0}\,\alpha_{0}\,s_{1}\,\alpha_{1}\,s_{2}\,\alpha_{2}\ldots$ where states and actions alternate such that $P(s_{i},\alpha_{i},s_{i+1})>0$ for all $i\geq 0$ . Throughout this section, we assume that all states are reachable from the initial state in any MDP, i.e., that there is a finite path from $s_{\mathit{\scriptscriptstyle init}}$ to each state $s$ . We extend the weight function to finite paths. For a finite path $\pi=s_{0}\,\alpha_{0}\,s_{1}\,\alpha_{1}\,\ldots\alpha_{k-1}\,s_{k}$ , we denote its accumulated weight by

[TABLE]

Similarly, we extend the transition probability function to finite paths and write

[TABLE]

A one-counter MDP is an MDP equipped with a counter. Each state-action pair increases or decreases the counter or leaves the counter unchanged. A one-counter MDP is said to terminate if the counter value drops below zero. We view one-counter MDPs as MDPs with a weight-function $\mathit{wgt}\colon S\times\mathit{Act}\to\{-1,0,+1\}$ . In this formulation a one-counter MDP terminates when a prefix $\pi$ of a path satisfies $\mathit{wgt}(\pi)<0$ .

A Markov chain is an MDP in which the set of actions is a singleton. There are no non-deterministic choices in a Markov chain and hence we drop the set of actions. Consequently, a Markov chain is a tuple $\mathcal{M}=(S,P,s_{\mathit{\scriptscriptstyle init}})$ , possibly extended with a weight function, a labeling, or a designated set of goal states. The transition probability function $P$ is a function from $S\times S$ to $[0,1]\cap\mathbb{Q}$ such that $\sum_{t\in S}P(s,t)\in\{0,1\}$ for all $s\in S$ .

Scheduler.

A scheduler for an MDP $\mathcal{M}=(S,\mathit{Act},P,s_{\mathit{\scriptscriptstyle init}})$ is a function $\mathfrak{S}$ that assigns to each finite path $\pi$ not ending in trap state a probability distribution over $\mathit{Act}(\mathit{last}(\pi))$ where $\mathit{last}(\pi)$ denotes the last state of $\pi$ . This probability distribution indicates which of the enabled actions is chosen with which probability under $\mathfrak{S}$ after the process has followed the finite path $\pi$ .

We allow schedulers to be randomized and history-dependent. By restricting the possibility to randomize over actions or by restricting the amount of information from the history of a run that can affect the choice of a scheduler, we obtain the following types of schedulers: A scheduler $\mathfrak{S}$ is called deterministic if it does not make use of the possibility to randomize over actions, i.e., if $\mathfrak{S}(\pi)$ is a Dirac distribution for each path $\pi$ . Such a scheduler $\mathfrak{S}$ can be viewed as a function that assigns an action to each finite path $\pi$ . A scheduler $\mathfrak{S}$ is called memoryless if $\mathfrak{S}(\pi)=\mathfrak{S}(\pi^{\prime})$ for all finite paths $\pi$ , $\pi^{\prime}$ with $\mathit{last}(\pi)=\mathit{last}(\pi^{\prime})$ . In this case, $\mathfrak{S}$ can be viewed as a function that assigns to each state $s$ a distribution over $\mathit{Act}(s)$ . A memoryless deterministic scheduler hence can be seen as a function from states to actions. In an MDP with a weight function, a scheduler $\mathfrak{S}$ is said to be weight-based if $\mathfrak{S}(\pi)=\mathfrak{S}(\pi^{\prime})$ for all finite paths $\pi$ , $\pi^{\prime}$ with $\mathit{wgt}(\pi)=\mathit{wgt}(\pi^{\prime})$ and $\mathit{last}(\pi)=\mathit{last}(\pi^{\prime})$ . Such a scheduler assigns distributions over actions to state-weight pairs from $S\times\mathit{Act}$ . Finally, let $X$ be a finite set of memory modes with initial mode $x_{\mathit{init}}$ and $U:X\times S\times\mathit{Act}\times S\to X$ a memory update function. From a finite path $\pi=s_{0}\,\alpha_{0}\,s_{1}\,\alpha_{1}\,\ldots\alpha_{k-1}\,s_{k}$ we can extract a sequence of memory modes $x_{0}\,\dots\,x_{k}$ . We let $x_{0}=x_{\mathit{init}}$ , and $x_{i+1}=U(x_{i},s_{i},\alpha_{i},s_{i+1})$ for all $i<k$ . Let us denote the last memory mode $x_{k}$ after the finite path $\pi$ by $U(x_{\mathit{init}},\pi)$ . A scheduler $\mathfrak{S}$ is a finite-memory scheduler if there is such a finite set of memory modes $X$ with an initial mode $x_{\mathit{init}}$ and an update function $U$ such that $\mathfrak{S}(\pi)=\mathfrak{S}(\pi^{\prime})$ for all finite paths $\pi$ , $\pi^{\prime}$ with $U(x_{\mathit{init}},\pi)=U(x_{\mathit{init}},\pi^{\prime})$ and $\mathit{last}(\pi)=\mathit{last}(\pi^{\prime})$ .

Probability measure.

Given an MDP $\mathcal{M}=(S,\mathit{Act},P,s_{\mathit{\scriptscriptstyle init}})$ and a scheduler $\mathfrak{S}$ , we obtain a probability measure $\mathrm{Pr}^{\mathfrak{S}}_{\mathcal{M},s}$ on the set of maximal paths of $\mathcal{M}$ that start in $s$ : For each finite paths $\pi=s_{0}\,\alpha_{0}\,s_{1}\,\alpha_{1}\,\ldots\alpha_{k-1}\,s_{k}$ with $s_{0}=s$ , we denote the cylinder set of all its maximal extensions by $\mathit{Cyl}(\pi)$ . The probability mass of this cylinder set is then given by

[TABLE]

Recall that $\mathfrak{S}(s_{0}\,\dots\,s_{i})$ is a probability distribution over actions and that $\mathfrak{S}(s_{0}\,\dots\,s_{i})(\alpha_{i})$ denotes the probability that the scheduler $\mathfrak{S}$ chooses action $\alpha$ after the prefix $s_{0}\,\dots\,s_{i}$ of $\pi$ . The set of cylinder sets forms the basis of the standard tree topology on the set of maximal paths. By Carathéodory’s extension theorem, we can extend the pre-measure $\mathrm{Pr}^{\mathfrak{S}}_{\mathcal{M},s}(\mathit{Cyl}(\pi))$ defined on the cylinder sets to a probability measure on the Borel $\sigma$ -algebra of the space of maximal paths with the standard tree topology. We sometimes drop the subscript $s$ if $s$ is the initial state $s_{\mathit{\scriptscriptstyle init}}$ of $\mathcal{M}$ . In a Markov chain $\mathcal{N}$ , we drop the reference to a scheduler and write $\mathrm{Pr}_{\mathcal{N},s}$ .

Let $X$ be a random variable on the set of maximal paths of $\mathcal{M}$ starting in $s$ , i.e., $X$ is a function assigning values from $\mathbb{R}\cup\{-\infty,+\infty\}$ to maximal paths. We denote the expected value of $X$ under the probability measure $\mathrm{Pr}^{\mathfrak{S}}_{\mathcal{M},s}$ by $\mathbb{E}_{\mathcal{M},s}^{\mathfrak{S}}(X)$ .

The values we are typically interested in are the worst- or best-case probabilities of an event or the worst- or best-case expected values of a random variable. Worst or best case refers to the possible ways to resolve the non-deterministic choices. Hence, these values are formally expressed by taking the supremum or infimum over all schedulers. Given an MDP $\mathcal{M}$ , a state $s$ , and an event, i.e., a set of maximal paths, $E$ , or a random variable $X$ on the maximal paths of $\mathcal{M}$ , we define

[TABLE]

where $\inf$ and $\sup$ range over all schedulers $\mathfrak{S}$ for $\mathcal{M}$ .

We use LTL-like notation such as “ $\lozenge$ (accumulated weight $<0$ )” to denote the event that a prefix of a path has a negative accumulated weight. Note that this event expresses the termination of a one-counter MDP in our view of one-counter MDPs as MDPs with a weight-function taking only values in $\{-1,0,+1\}$ .

Classical stochastic shortest path problem.

Let $\mathcal{M}$ be an MDP with a weight function $\mathit{wgt}\colon S\times\mathit{Act}\to\mathbb{Z}$ and a designated set of terminal goal states $\mathit{Goal}$ . We define the following random variable $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{Goal}$ on maximal paths $\zeta$ of $\mathcal{M}$ as follows:

[TABLE]

The expected accumulated weight before reaching $\mathit{Goal}$ under a scheduler $\mathfrak{S}$ is given by the expected value $\mathbb{E}^{\mathfrak{S}}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{Goal})$ . It is evident that this expected value is only defined if $\mathrm{Pr}^{\mathfrak{S}}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{Goal})=1$ . The classical stochastic shortest path problem asks for the optimal value

[TABLE]

where the supremum ranges over all schedulers $\mathfrak{S}$ with $\mathrm{Pr}^{\mathfrak{S}}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{Goal})=1$ . The classical stochastic shortest path problem can be solved in polynomial time [BT91, dA99, BBD*+*18].

3 Outline of the Positivity-hardness proofs

The Positivity-hardness results in this paper are obtained by sequences of reductions depicted in Figure 1. The key steps for these sequences are the three direct reductions from the Positivity-problem to the threshold problems for the maximal termination probability of one-counter MDPs, the maximal partial expectation, and the maximal conditional value-at-risk, respectively.

3.1 Structure of the MDP constructed for the direct reductions from the Positivity problem

The three direct reductions from the Positivity problem (at the top of Figure 1) follow a modular approach: The MDPs constructed for the reductions are obtained by putting together three gadgets as sketched in Figure 2. One gadget encodes a linear recurrence relation exploiting the dependency of optimal values from different starting states after different amounts of weight have been accumulated in the history of a run onto each other. A second gadget encodes the initial values of a linear recurrence sequence. Together, these two gadget allow us to encode linear recurrence sequences. Finally, an initial gadget is added in which each positive amount of weight $w$ is accumulated with positive probability. Afterwards, the gadget is left and a scheduler has to decide how to leave the initial gadget. The optimal decision if weight $w$ has been accumulated directly corresponds to whether the $w$ th member of the given linear recurrence sequence is non-negative.

More precisely, let a rational linear recurrence sequence be given in terms of the initial values $u_{0},\dots,u_{k-1}$ and the coefficients $\alpha_{1},\dots,\alpha_{k}$ of the linear recurrence relation. The three gadgets are connected via two states $s$ and $t$ as depicted in Figure 2. In state $t$ and $s$ , actions $\gamma_{0},\dots,\gamma_{k-1}$ and $\delta_{0},\dots,\delta_{k-1}$ , respectively, leading to the gadget encoding the initial values and action $\gamma$ and $\delta$ , respectively leading to the gadget encoding the linear recurrence relation are enabled. The gadgets will be constructed such that an optimal scheduler has to choose action $\gamma_{i}$ or $\delta_{i}$ if the accumulated weight in state $t$ or $s$ is a value $i$ with $0\leq i<k$ and that it has to choose action $\gamma$ if the accumulated weight is at least $k$ . After $\gamma$ or $\delta$ is chosen, the accumulated weight is decreased within the gadget encoding the linear recurrence relation before the MDP moves back to the states $s$ and $t$ with positive probability.

Let us now denote the maximal possible value for the quantity of interest when starting in one of the states $t$ and $s$ with accumulated weight $w$ by $V(t,w)$ and $V(s,w)$ . The linear recurrence relation will be found in the difference $d(w)\stackrel{{\scriptstyle\text{\tiny def}}}{{=}}V(t,w)-V(s,w)$ . If the accumulated weight is $0\leq i<k$ , the gadget encoding the initial values will make sure that $d(i)=V(t,i)-V(s,i)=u_{i}$ . For each of the three direct reductions from the Positivity problem, we construct one such gadget tailored to the three respective quantities.

For accumulated weights $w$ of at least $k$ , the gadget encoding the recurrence will exploit the dependency of the optimal values $V(t,w)$ and $V(s,w)$ on the optimal values when starting with lower accumulated weight. This gadget can be used in all reductions and will be described in the next subsection.

Put together, these two gadgets ensure that $d(w)=u_{w}$ for all $w\geq 0$ . To complete the reductions, we add an initial gadget $\mathcal{I}$ depicted in Figure 3 in which each positive amount of weight $w$ is accumulated with positive probability. Afterwards, a scheduler has to choose whether to move to state $t$ or state $s$ via the actions $\tau$ and $\sigma$ , respectively. It is optimal to move to $t$ if and only if $u_{w}\geq 0$ . Let now $\mathfrak{S}$ be the scheduler always choosing $\tau$ in the initial gadget and afterwards behaving optimally when choosing from $\gamma_{0},\dots,\gamma_{k-1}$ and $\gamma$ or $\delta_{0},\dots,\delta_{k-1}$ and $\delta$ as described above. This scheduler is optimal if and only if the given linear recurrence sequence is non-negative. The final step to complete the reduction is to compute the value $V^{\mathfrak{S}}(s_{\mathit{\scriptscriptstyle init}},0)$ that is achieved by $\mathfrak{S}$ starting from the initial state. In all three reductions, we can compute this rational value via converging matrix series. The optimal value $V^{\max}(s_{\mathit{\scriptscriptstyle init}},0)$ that can be achieved from the initial state now satisfies

[TABLE]

if and only if the given linear recurrence sequence is non-negative.

3.2 MDP-gadget for linear recurrence relations

In this section, we demonstrate how to construct the gadget ensuring that the difference of optimal values $V(s,w)-V(t,w)$ follows a given linear recurrence relation with respect to different weight levels $w$ . In the next section, the initial values of a linear recurrence sequence will be encoded in MDP-gadgets tailored to the different quantities we address.

Optimality equations.

Let us start by the following observations on the well-known relation between the optimal values at different states in the classical stochastic shortest path problem, i.e., the maximal expected accumulated weights before reaching a goal state (defined in Section 2). Let $\mathcal{M}=(S,\mathit{Act},P,s_{\mathit{\scriptscriptstyle init}},\mathit{wgt},\mathit{Goal})$ be an MDP. The solution to the classical stochastic shortest path problem satisfies the so called Bellman equation. If $V(s)$ denotes the value when starting in state $s$ , i.e., the maximal expected accumulated weight before reaching $\mathit{Goal}$ from state $s$ , then

[TABLE]

for $s\not\in\mathit{Goal}$ and $V(s)=0$ for $s\in\mathit{Goal}$ . This simple form of optimality equation implies the existence of optimal memoryless deterministic schedulers for the classical stochastic shortest path problem (in case optimal schedulers exist, i.e., if the optimal values are finite).

For problems like the optimization of the termination probability of one-counter MDPs, it is, however, clearly not sufficient to consider the optimal values only in dependency of the starting state. The counter-value, i.e. the weight that has been accumulated so far, is essential. So, let $V(s,w)$ denote the maximal termination probability of a one-counter MDP when starting in state $s$ with counter value $w$ . Letting $V(s,w)=1$ if $w<0$ , we obtain the following equation for all states $s$ and all values $w\geq 0$ :

[TABLE]

Already in this equation, the value $V(s,w)$ hence possibly depends on values of the form $V(s,w-i)$ for some $i$ . We want to exploit this interrelation to encode linear recurrence relations

[TABLE]

into the optimal values $V(s,w)$ . Of course, the values $P(s,\alpha,t)$ are all non-negative. So, we cannot directly encode a linear recurrence into the optimal values for different weight levels at one state as the coefficients might be negative. To overcome this problem, we instead consider the difference $V(s,w)-V(t,w)$ for two different states $s$ and $t$ .

Scaling down coefficients of a linear recurrence sequence.

Given the coefficients $\alpha_{1},\dots,\alpha_{k}$ , and initial values $u_{0}=\beta_{0}$ , …, $u_{k-1}=\beta_{k-1}$ of a linear recurrence sequence, we have to assume that these are all sufficiently small for the following constructions. So, let us clarify why we can assume this without loss of generality and let us provide precise bounds. Let $(u_{n})_{n\geq 0}$ be a linear recurrence sequence specified by the initial values $u_{0}=\beta_{0}$ , …, $u_{k-1}=\beta_{k-1}$ and the linear recurrence relation $u_{n+k}=\alpha_{1}u_{n+k-1}+\dots+\alpha_{k}u_{n}$ for all $n\geq 0$ . For any $\mu>0$ and $\lambda>0$ , the sequence $(v_{n})_{n\geq 0}$ defined by $v_{n}=\mu\cdot\lambda^{n}\cdot u_{n}$ for all $n$ is non-negative if and only if $(u_{n})_{n\geq 0}$ is non-negative. Furthermore, it satisfies $v_{i}=\mu\cdot\lambda^{i}\cdot\beta_{i}$ for $i<k$ and

[TABLE]

By choosing $\lambda$ and $\mu$ appropriately, we can scale down the initial values and coefficients of the recurrence relation for any given input.

To obtain precise bounds that will be used throughout the following sections, let $\alpha\stackrel{{\scriptstyle\text{\tiny def}}}{{=}}\sum_{i=1}^{k}|\alpha_{i}|$ . and let $\lambda\stackrel{{\scriptstyle\text{\tiny def}}}{{=}}\frac{1}{\alpha\cdot(5k+5)}$ . The value $\lambda$ can be computed in polynomial time. As the numerical value of $k$ is linear in the size of the given original input, the coefficients $\alpha_{1}^{\prime}\stackrel{{\scriptstyle\text{\tiny def}}}{{=}}\lambda\cdot\alpha_{1},\alpha_{2}^{\prime}\stackrel{{\scriptstyle\text{\tiny def}}}{{=}}\lambda^{2}\cdot\alpha_{2},\dots,\alpha_{k}^{\prime}\stackrel{{\scriptstyle\text{\tiny def}}}{{=}}\lambda^{k}\cdot\alpha_{k}$ of the linear recurrence of the sequence $(v_{n})_{n\geq 0}$ can be computed in polynomial time as well. The choice of $\lambda$ ensures that $\sum_{i=1}^{k}|\alpha_{i}^{\prime}|<\frac{1}{5k+5}$ .

Let now $\alpha^{\prime}\stackrel{{\scriptstyle\text{\tiny def}}}{{=}}\sum_{i=1}^{k}|\alpha_{i}^{\prime}|$ and $\beta\stackrel{{\scriptstyle\text{\tiny def}}}{{=}}\max_{0\leq j<k}|\beta_{j}|$ . We can choose $\mu\stackrel{{\scriptstyle\text{\tiny def}}}{{=}}\frac{\min(\alpha^{\prime},1)}{4k^{2k+2}\cdot\beta}$ . Again, since the value $k$ is linear in the size of the original input, $\mu$ can be computed in polynomial time. The initial values of the new sequence $(v_{n})_{n\geq 0}$ are now $\beta_{i}^{\prime}\stackrel{{\scriptstyle\text{\tiny def}}}{{=}}v_{i}=\mu\cdot\lambda^{i}\cdot\beta_{i}$ for $i<k$ , computable in polynomial time. The choice of $\mu$ guarantees that $\max_{0\leq j<k}\beta_{j}^{\prime}<\min(\frac{1}{4k^{2k+2}},\frac{\alpha^{\prime}}{4})$ .

Since this transformation can be carried out in polynomial time, we can w.l.o.g. from now on work under the following assumption:

Assumption 3.1.

Given the coefficients $\alpha_{1},\dots,\alpha_{k}$ , and initial values $u_{0}=\beta_{0}$ , …, $u_{k-1}=\beta_{k-1}$ of a linear recurrence sequence, we assume that

[TABLE]

MDP-gadget for linear recurrence relations.

Given the coefficients $\alpha_{1},\dots,\alpha_{k}$ of a linear recurrence relation satisfying Assumption 3.1, we construct the MDP-gadget depicted in Figure 4. The gadget contains states $s$ , $t$ , and $\mathit{trap}$ as well as $s_{1},\dots,s_{k}$ and $t_{1},\dots,t_{k}$ . In state $t$ , an action $\gamma$ is enabled which has weight [math] and leads to state $t_{i}$ with probability $\alpha_{i}$ if $\alpha_{i}>0$ and to state $s_{i}$ with probability $|\alpha_{i}|$ if $\alpha_{i}<0$ for all $i$ . The remaining probability leads to $\mathit{trap}$ . From each state $t_{i}$ , there is an action leading to $t$ with weight $-i$ . The action $\delta$ enabled in $s$ as well as the actions leading from states $s_{i}$ to $s$ are constructed in the analogously. If $\alpha_{i}$ is negative, action $\delta$ reaches state $t_{i}$ with probability $|\alpha_{i}|$ . Otherwise it reaches $s_{i}$ with probability $\alpha_{i}$ . The state $\mathit{trap}$ is absorbing. As the gadget depends on the inputs $\bar{\alpha}=(\alpha_{1},\dots,\alpha_{k})$ , we call it $\mathcal{G}_{\bar{\alpha}}$ .

This gadget $\mathcal{G}_{\bar{\alpha}}$ will be integrated into MDPs without further outgoing edges from states $s_{1},\dots,s_{k},t_{1},\dots,t_{k}$ . For any optimization problem for which the optimal values $V$ depend on the state and the weight accumulated so far and satisfy equation ( $\ast$ ), we can encode a linear recurrence in an MDP containing this gadget (and possibly further actions for state $t$ and $s$ ): If we know that an optimal scheduler chooses action $\gamma$ in state $t$ and action $\delta$ in state $s$ if the accumulated weight is $w$ , then

[TABLE]

Note that this linear recurrence relation also holds for the optimal values in the classical stochastic shortest path problem for example. So, the gadget alone is not yet enough for a hardness proof. The missing ingredient is the encoding of the initial values of a linear recurrence sequence. In order to include the encoding of the initial values in our approach, it is necessary that optimal schedulers cannot be chosen to be memoryless. The optimal decisions have to depend on the weight that has been accumulated in the history of a run. If this is the case, we aim to encode the initial values by adding further outgoing actions to the states $t$ and $s$ . By fine-tuning the weights and probabilities of these actions, we can achieve that for small weights $w$ some of the new actions are optimal while for large weights the actions $\gamma$ and $\delta$ of the gadget are optimal. If we manage to design the other actions such that the differences $V(t,w+i)-V(s,w+i)$ are equal to given starting values $\beta_{i}$ for a sequence of weights $w,w+1,\dots,w+k-1$ while actions $\gamma$ and $\delta$ are optimal for weights of at least $w+k$ , we can encode arbitrary linear recurrence sequences. This is the goal of the subsequent section.

4 Reductions from the Positivity problem

To encode initial values of a linear recurrence sequence, we construct further MDP gadgets. For the termination probability and expected termination time of one-counter MDPs and for partial expectations, we can construct these gadgets directly. For the conditional value-at-risk, we use an intermediate auxiliary random variable. Putting together these gadgets with the gadget $\mathcal{G}_{\bar{\alpha}}$ from the previous section, we obtain the basis for the Positivity-hardness results of the respective threshold problems. The Positivity-hardness of the remaining problems is obtained as a consequence of these results via further reductions. An overview of the chains of reductions used is presented in Figure 1.

4.1 One-counter MDPs, energy objectives, cost problems, and quantiles

The first problem we will show to be Positivity-hard is the threshold problem for the optimal termination probability of one-counter MDPs. From this result, Positivity-hardness results for energy objectives, cost problems, and the computation of quantiles follow easily. Afterwards, we adjust the reduction to show Positivity-hardness of the threshold problem for the optimal expected termination time of almust-surely terminating one-counter MDPs.

Termination probability of one-counter MDPs.

We formulated the termination of a one-counter MDP in terms of weighted MDPs $\mathcal{M}$ . Recall that a one-counter MDP terminates if the counter value drops below zero. If we consider the weight that has been accumulated instead of the counter value, the quantities we are interested are $\mathrm{Pr}^{\mathrm{opt}}_{\mathcal{M}}(\lozenge\text{ accumulated weight}<0)$ for $\mathrm{opt}=\max$ and $\mathrm{opt}=\min$ . The main result we prove in this section is the following:

Theorem 4.1.

The Positivity problem is reducible in polynomial time to the following problems: Given an MDP $\mathcal{M}$ and a rational $\vartheta\in(0,1)$ ,

decide whether $\mathrm{Pr}^{\max}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\lozenge(\text{accumulated weight$ <0 $}))>\vartheta$ . 2. 2.

decide whether $\mathrm{Pr}^{\min}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\lozenge(\text{accumulated weight$ <0 $}))<\vartheta$ .

Note that if weights are encoded in unary, we can transform a weighted MDP to a one-counter MDP that can only increase or decrease the counter value by $1$ in each step in polynomial time. The MDPs that are constructed from a linear recurrence sequence of depth $k$ in the proof of Theorem 4.1 will contain only weights with an absolute value of at most $k$ . So, they can be transformed to one-counter MDPs in time linear in the size of the original input and we conclude that the following two threshold problems for the optimal termination probability of one-counter MDPs are Positivity-hard:

Corollary 4.2.

The Positivity problem is reducible in polynomial time to the following problems: Given a one-counter MDP $\mathcal{M}$ viewed as an MDP with weights in $\{-1,0,+1\}$ and a rational $\vartheta\in(0,1)$ ,

decide whether $\mathrm{Pr}^{\max}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\lozenge(\text{accumulated weight$ <0 $}))>\vartheta$ . 2. 2.

decide whether $\mathrm{Pr}^{\min}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\lozenge(\text{accumulated weight$ <0 $}))<\vartheta$ .

Among the direct reductions from the Positivity problem we present, the construction of the gadget encoding the initial values of a linear recurrence sequence is arguably the simplest for these optimal termination probabilities. In the formulation with weighted MDPs, the termination of a one-counter MDP is moreover the complement of the energy objective “ $\square\text{ accumulated weight}\geq 0$ ”. We will first prove Positivity-hardness for the threshold problem for maximal termination probabilities and line out the necessary adjustments to show Positivity-hardness also for the threshold problem for minimal termination probabilities afterwards.

We split the proof of Theorem 4.1 into four parts. First, we provide the construction of an MDP from a linear recurrence sequence. Then, we show that the linear recurrence sequence is correctly encoded in this MDP in terms of the maximal termination probabilities. To complete the proof of item 1, we then show how to compute the threshold $\vartheta$ for the threshold problem. Finally, we show how to adapt the construction to prove hardness of the threshold problem for minimal termination probabilities.

Proof of Theorem 4.1(1): construction of the MDP.

Given a linear recurrence sequence in terms of the rational coefficients $\alpha_{1},\dots,\alpha_{k}$ of the linear recurrence relation as well as the rational initial values $\beta_{0},\dots,\beta_{k-1}$ for $k\geq 2$ , our first goal is to construct an MDP $\mathcal{M}$ and a rational $\vartheta\in(0,1)$ such that

[TABLE]

By Assumption 3.1, we can assume that the input values are sufficiently small. More precisely, we assume that $\sum_{i=1}^{k}|\alpha_{i}|<1/(k+1)$ and that $0\leq\beta_{j}<1/(k+1)$ for all $0\leq j\leq k-1$ , which is ensured by the bounds in Assumption 3.1.

We denote the maximal termination probabilities in terms of the current state $s$ and counter value (accumulated weight) $w$ by $p(s,w)$ . More precisely, in an MDP $\mathcal{M}$ for $w\geq 0$ , we define

[TABLE]

The values $p(s,w)$ in an MDP with state space $S$ now satisfy the optimality equation ( $\ast$ ) from Section 3.2 (where $p(s,w)$ takes the role of $V(s,w)$ in ( $\ast$ )), which we restate here for convenience. We have $p(s,w)=1$ for all states $s$ and all $w<0$ and

[TABLE]

So, to capture the linear recurrence relation, we will be able to make use of the gadget $\mathcal{G}_{\bar{\alpha}}$ from Section 3.2. The missing ingredient is a gadget to encode the initial values of a linear recurrence sequence.

The new gadget $\mathcal{O}_{\bar{\beta}}$ encoding the initial values $\bar{\beta}$ is depicted in Figure 5 and works as follows: For $0\leq j\leq k-1$ , the action $\gamma_{j}$ enabled in $t$ leads to state $x_{j}$ with probability $\frac{k-j}{k+1}+\beta_{j}$ . By assumption on $\beta_{j}$ , this probability is less than $\frac{k-j+1}{k+1}$ . The remaining probability leads to $\mathit{trap}$ . In state $s$ , the action $\delta_{j}$ leads to $y_{j}$ with probability $\frac{k-j}{k+1}$ and to $\mathit{trap}$ with the remaining probability. For $0\leq j\leq k-1$ , one reaches $\mathit{trap}$ from $x_{j}$ and $y_{j}$ with probability $1$ and a counter change of $-(j+1)$ .

Now, we glue together the initial gadget $\mathcal{I}$ defined in Section 3.1, the gadget encoding the linear recurrence relation $\mathcal{G}_{\bar{\alpha}}$ from Section 3.2, and the new gadget $\mathcal{O}_{\bar{\beta}}$ at states $t$ , $s$ , and $\mathit{trap}$ . The resulting MDP $\mathcal{M}$ is depicted in Figure 6 – for better readability, it is depicted for $k=2$ and assuming that $\alpha_{1}\geq 0$ while $\alpha_{2}<0$ .

Proof of Theorem 4.1(1): correctness of the encoding of the linear recurrence sequence.

In this paragraph, we show that the initial linear recurrence sequence is indeed encoded in the maximal termination probabilities when starting from states $t$ and $s$ with different counter values, i.e., values of accumulated weight as described in Section 3.1. More precisely, let $(u_{n})_{n\geq 0}$ be the linear recurrence sequence given by the initial values $\beta_{0},\dots\beta_{k-1}$ and the coefficients $\alpha_{1},\dots,\alpha_{k}$ of the linear recurrence relation. We prove the following:

Lemma 4.3.

For each $w\geq 0$ , we have

[TABLE]

where $p(r,w)$ denotes the maximal termination probability from state $r\in\{s,t\}$ when starting with accumulated weight $w$ as defined above.

Proof.

For the correct interplay of the gadgets $\mathcal{G}_{\bar{\alpha}}$ and $\mathcal{O}_{\bar{\beta}}$ , the optimal decisions in states $t$ and $s$ for different values of accumulated weights, i.e., different counter-values, are crucial. In order to terminate, the accumulated weight has to drop below [math] before reaching $\mathit{trap}$ . As soon as the trap state is reached with non-negative accumulated weight, the process cannot terminate anymore. The optimal decision in order to maximize the termination probability in state $t$ is now easy to determine. Let $\ell$ be the current weight. If $0\leq\ell\leq k-1$ , choosing action $\gamma$ leads to termination with probability less than $1/(k+1)$ as $\mathit{trap}$ is reached immediately with probability at least $k/(k+1)$ due to our assumption that $\sum_{i\leq k}|\alpha_{i}|<1/(k+1)$ . Choosing action $\gamma_{j}$ makes it impossible to terminate if $\ell>j$ . If $\ell\leq j$ , then choosing $\gamma_{j}$ lets the process terminate if $x_{j}$ is reached. This happens with probability $\frac{k-j}{k+1}+\beta_{j}$ . As $\beta_{j}<1/(k+1)$ for all $j$ , the maximal termination probability is reached when choosing $\gamma_{\ell}$ . If $\ell\geq k$ , then $\gamma_{j}$ leads to termination with probability [math] for all $j$ . Hence, action $\gamma$ is optimal. Analogously, we see that the optimal choice in state $s$ with weight $\ell$ is $\delta_{\ell}$ if $\ell\leq k-1$ and $\delta$ otherwise.

The linear recurrence sequence $(u_{n})_{n\geq 0}$ now can be found in terms of the difference

[TABLE]

For counter value $w\leq k-1$ , we have seen that $\gamma_{w}$ and $\delta_{w}$ , respectively, are the optimal actions. Hence, $d(w)=u_{w}$ in this case as we have just seen that the optimal termination probability when starting with weight $w\leq k-1$ is $\frac{k-w}{k+1}+\beta_{w}$ in $t$ and $\frac{k-w}{k+1}$ in $s$ . Furthermore, for $w>k-1$ , actions $\gamma$ and $\delta$ are optimal. So, by the discussion in Section 3.2, the sequence of differences satisfies the linear recurrence relation given by $\alpha_{1},\dots,\alpha_{k}$ . Therefore, $d(w)=u_{w}$ for all $w\geq 0$ . ∎

Proof of Theorem 4.1(1): computation of the threshold $\vartheta$ .

The state $\mathit{choice}$ is reached with any positive accumulated weight with positive probability. For the optimal choices in the state $\mathit{choice}$ with accumulated weight $w$ , we observe that choosing $\tau$ is optimal iff $d(w)\geq 0$ . By Lemma 4.3, this holds if and only if $u_{w}\geq 0$ .

Consider now the scheduler $\mathfrak{S}$ which always chooses $\tau$ in state $\mathit{choice}$ and afterwards behaves according to the optimal choices as described in the proof of Lemma 4.3. This scheduler $\mathfrak{S}$ is optimal if and only if the sequence $(u_{n})_{n\geq 0}$ is non-negative. To complete the reduction, we will compute the value

[TABLE]

We will see that $\vartheta$ is a rational computable in polynomial time and we know that

[TABLE]

if and only if the scheduler $\mathfrak{S}$ is optimal which is the case iff $(u_{n})_{n\geq 0}$ is non-negative.

Lemma 4.4.

In the constructed MDP $\mathcal{M}$ , the value $\vartheta=\mathrm{Pr}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}^{\mathfrak{S}}(\lozenge(\text{accumulated weight }<0))$ can be computed in polynomial time.

Proof.

In order to compute the value $\vartheta$ , we first provide a recursive expression of the maximal termination probabilities $p(t,w)$ and $p(s,w)$ . By the definition of $\mathfrak{S}$ , these are precisely the termination probabilities under $\mathfrak{S}$ when starting from $t$ or $s$ with some positive accumulated weight $w\in\mathbb{N}$ because $\mathfrak{S}$ behaves optimally as soon as state $t$ or $s$ has been reached.

For this recursive expression, we consider the following Markov chain $\mathcal{C}$ for $n\in\mathbb{N}$ that is also depicted in Figure 7 – for better readability, it is depicted for the case $k=2$ there: The Markov chain $\mathcal{C}$ has $5k$ states named $t_{-k+1}$ , …, $t_{+k}$ , $s_{-k+1}$ , …, $s_{+k}$ , and $\mathit{goal}_{+1}$ , …, $\mathit{goal}_{+k}$ . States $t_{-k+1}$ , …, $t_{0}$ , $s_{-k+1}$ , …, $s_{0}$ , and $\mathit{goal}_{+1}$ , …, $\mathit{goal}_{+k}$ are terminal. For $0<i,j\leq k$ , there are transitions from $t_{+i}$ to $t_{+i-j}$ with probability $\alpha_{j}$ if $\alpha_{j}>0$ , to $s_{+i-j}$ with probability $|\alpha_{j}|$ if $\alpha_{j}<0$ , and to $\mathit{goal}_{+i}$ with probability $1-|\alpha_{1}|-\ldots-|\alpha_{k}|$ . Transitions from $s_{+i}$ are defined analogously.

The idea behind this Markov chain is that the reachability probabilities describe how, for arbitrary $n\in\mathbb{N}$ and $1\leq i\leq k$ , the values $p(t,nk+i)$ and $p(s,nk+i)$ depend on the values $p(t,(n-1)k+j)$ and $p(s,(n-1)k+j)$ for $1\leq j\leq k$ . The transitions in $\mathcal{C}$ behave as $\gamma$ and $\delta$ in $\mathcal{M}$ , but the decrease in the accumulated weight is explicitly encoded into the state space. Namely, for $n\in\mathbb{N}$ and $0<i\leq k$ , we have

[TABLE]

and analogously for $p(s,nk+i)$ . We now group the optimal values together in the following column vectors

[TABLE]

for $n\in\mathbb{N}$ . In other words, this vector contains the optimal values for the partial expectation when starting in $t$ or $s$ with an accumulated weight from $\{nk,\dots,nk+k-1\}$ . The vector $v_{0}$ is the column vector

[TABLE]

and these values occur as transition probabilities in $\mathcal{M}$ under the actions $\gamma_{k-1},\dots,\gamma_{0}$ and $\delta_{k-1},\dots,\delta_{0}$ .

As the reachability probabilities in $\mathcal{C}$ are rational and computable in polynomial time, we conclude from equation ( $\ast$ ‣ 4.1) that there is a matrix $A\in\mathbb{Q}^{2k\times 2k}$ computable in polynomial time such that $v_{n+1}=Av_{n}$ for all $n\in\mathbb{N}$ . So, $v_{n}=A^{n}v_{0}$ for all $n\in\mathbb{N}$ .

As state $\mathit{choice}$ is reached with weight $w$ with probability $(1/2)^{w}$ for all $w\geq 1$ , the value $\vartheta=\sum_{w=1}^{\infty}(1/2)^{w}p(t,w)$ . Let $c=(\frac{1}{2^{k}},\frac{1}{2^{k-1}},\dots,\frac{1}{2^{1}},0,\dots,0)$ . Observe that for all $n\in\mathbb{N}$ ,

[TABLE]

Hence, we can write

[TABLE]

We have to subtract $p(t,0)$ as the state $\mathit{choice}$ cannot be reached with weight [math], but the summand $1\cdot p(t,0)$ occurs in the sum. As $p(t,0)=\frac{k}{k+1}+\beta_{0}$ , this does not cause a problem.

We claim that the matrix series involved converges to a rational matrix. We observe that the maximal row sum in $A$ is at most $|\alpha_{1}|{+}\ldots{+}|\alpha_{k}|<1$ because the rows of the matrix contain exactly the probabilities to reach $t_{0}$ , … $t_{-k+1}$ , $s_{0}$ , …, and $s_{-k+1}$ from a state $t_{+i}$ or $s_{+i}$ in $\mathcal{C}$ for $1\leq i\leq k$ . But the probability to reach $\mathit{goal}_{+i}$ from these states is already $1{-}|\alpha_{1}|{-}\ldots{-}|\alpha_{k}|$ . Hence, $\|A\|_{\infty}$ , the operator norm induced by the maximum norm $\|\cdot\|_{\infty}$ , which equals $\max_{i}\sum_{j=1}^{2k}|A_{ij}|$ , is less than $1$ . So, in particular, also $\|\frac{1}{2^{k}}A\|_{\infty}<1$ and hence the Neumann series $\sum_{n=0}^{\infty}\left(\frac{1}{2^{k}}A\right)^{n}$ converges to $\left(I_{2k}-\frac{1}{2^{k}}A\right)^{-1}$ where $I_{2k}$ is the identity matrix of size $2k{\times}2k$ . So,

[TABLE]

is computable in polynomial time. ∎

All in all, this finishes the proof of point (1) of Theorem 4.1: We have seen that the MDP $\mathcal{M}$ and the threshold $\vartheta$ can be constructed in time polynomial in the size of the representations of $\alpha_{1},\dots,\alpha_{k}$ and $\beta_{0},\dots,\beta_{k-1}$ . As $\vartheta=\mathrm{Pr}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}^{\mathfrak{S}}(\lozenge(\text{accumulated weight }<0))$ , we furthermore know that

[TABLE]

if and only if the scheduler $\mathfrak{S}$ is not optimal. By Lemma 4.3, this is the case if and only if the given linear recurrence sequence $(u_{n})_{n\geq 0}$ has a negative member.

Finally, we want to emphasize again that the absolute values of the weights in the constructed MDP are at most $k$ . Hence, if we want to view $\mathcal{M}$ as a one-counter MDP in which the counter value can only be increased or decreased by $1$ in each step, the constructed MDP becomes only polynomially larger after we replace the transitions with a weight $+w$ or $-w$ for a $1\leq w\leq k$ by a sequence of $w$ states decreasing or increasing the counter value, which allowed us to conclude Corollary 4.2.

Proof of Theorem 4.1(2).

The construction we provided so far shows that the threshold problem for the maximal termination probability of one-counter MDPs is Positivity-hard. Using exactly the same ideas, we can show that the threshold problem for the minimal termination probability is Positivity-hard as well. Let us describe the necessary changes in the construction that are also depicted in Figure 8. We rename the state $\mathit{trap}$ to $\mathit{trap}^{\prime}$ and add a transition with weight $-k$ to a new absorbing state $\mathit{trap}$ . For all $0\leq j\leq k-1$ , now state $\mathit{trap}$ is reached directly with probability $1$ and weight $-j$ from the states $x_{j}$ and $y_{j}$ . Furthermore, the probability to reach $x_{j}$ when choosing $\gamma_{j}$ in $t$ is changed to $\frac{j+1}{k+1}+\beta_{j}$ and the probability to reach $\mathit{trap}^{\prime}$ is adjusted accordingly. The analogous change is performed for $\delta_{j}$ . Now, it is easy to check that the optimal choice to minimize the termination probability in state $t$ is to choose $\gamma$ if the accumulated weight is $\geq k$ . In this case the probability of termination is less than $\frac{1}{k+1}$ . If the accumulated weight is $0\leq\ell<k$ , the optimal choice is $\gamma_{\ell}$ . The analogous result holds in state $s$ . From then on the proof is analogous to the proof for the maximal termination probability with the change that we have to consider the scheduler $\mathfrak{S}$ always choosing $\sigma$ in the state $\mathit{choice}$ this time. This scheduler is optimal to minimize the termination probability if and only if the given linear recurrence sequence is non-negative. With these adjustments, we conclude:

Corollary 4.5.

The Positivity problem is reducible in polynomial time to the following problem: Given an MDP $\mathcal{M}$ and a rational $\vartheta\in(0,1)$ , decide whether

[TABLE]

Remark 4.6.

There is no obvious way to adjust the construction such that the Positivity-hardness of the question whether $\mathrm{Pr}^{\max}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\lozenge(\text{accumulated weight$ <0 $}))\geq\vartheta$ would follow. One attempt would be to provide an $\varepsilon$ such that $\mathrm{Pr}^{\max}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\lozenge(\text{accumulated weight$ <0 $}))>\vartheta$ if and only if $\mathrm{Pr}^{\min}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\lozenge(\text{accumulated weight$ <0 $}))\geq\vartheta+\varepsilon$ . This, however, probably requires a bound on the position at which the given linear recurrence sequence first becomes negative. But this question lies at the core of the Positivity problem. The analogous observation applies to the question whether $\mathrm{Pr}^{\min}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\lozenge(\text{accumulated weight$ <0 $}))\leq\vartheta$ and all Positivity-hardness results in the sequel.

Energy objectives.

As the energy objective $\Box(\text{accumulated weight$ \geq 0 $})$ is satisfied if and only if $\lozenge(\text{accumulated weight$ <0 $})$ does not hold, the Positivity-hardness of the threshold problem of the optimal satisfaction probability of an energy objective follows easily. As

[TABLE]

we conclude:

Corollary 4.7.

The Positivity problem is reducible in polynomial time to the following problems: Given an MDP $\mathcal{M}$ and a rational $\vartheta\in(0,1)$ ,

decide whether $\mathrm{Pr}^{\max}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\Box(\text{accumulated weight$ \geq 0 $}))>\vartheta$ . 2. 2.

decide whether $\mathrm{Pr}^{\min}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\Box(\text{accumulated weight$ \geq 0 $}))<\vartheta.$

Cost problems and quantiles. The proof of the Positivity-hardness of the threshold problem for the termination probability of one-counter MDPs in fact also serves as a proof that cost problems and the computation of quantiles of the accumulated weight before reaching a goal state are Positivity-hard. Observe that in the MDP constructed for Theorem 4.1 and Corollary 4.5, almost all paths $\zeta$ under any scheduler satisfy $\lozenge(\text{accumulated weight$ <0 $})$ iff they satisfy $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{trap}(\zeta)<0$ iff their total accumulated weight is less than [math]. Thus, we obtain the following corollary:

Corollary 4.8.

The Positivity problem is reducible in polynomial time to the following problems: Given an MDP $\mathcal{M}$ with a designated set of trap states $\mathit{Goal}$ and a rational $\vartheta\in(0,1)$ ,

decide whether $\mathrm{Pr}^{\max}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{Goal}<0)>\vartheta$ . 2. 2.

decide whether $\mathrm{Pr}^{\min}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{Goal}<0)<\vartheta$ .

The analogous result also holds for the total accumulated weight.

Termination times of one-counter MDPs.

To conclude the section, we show that not only the threshold problems for optimal termination probabilities, but also for the optimal expected termination times in one-counter MDPs that terminate almost surely is Positivity-hard. We again work with weighted MDPs. Let $T$ be the random variable that assigns to each path in a weighted MDP $\mathcal{M}$ the length of the shortest prefix $\pi$ such that $\mathit{wgt}(\pi)<0$ . To reflect precisely the behavior of a one-counter MDP, we now will work with MDPs where the weight is reduced or increased by at most $1$ in each step. We make a small change to the MDP constructed for the proof of Corollary 4.5 that is depicted in Figure 8. The initial component (that is not depicted) stays unchanged. For the remaining transitions, all transition reduce the weight or leave it unchanged. The transitions with weight [math] do not occur directly after each other except for the loop at the state $\mathit{trap}$ that we adjust in a moment. Hence, we can add additional auxiliary states such that along each path starting from $s$ or $t$ not reaching the state $\mathit{trap}$ , the weight is left unchanged and reduced by $1$ in an alternating fashion. So, if a path starts in state $s$ or $t$ with accumulated weight $w$ and terminates (i.e. reaches accumulated weight $-1$ ) before reaching the state $\mathit{trap}$ this takes $2(w+1)$ steps. Now, we replace the loop at the state $\mathit{trap}$ by the gadget depicted in Figure 9 and let us call the resulting MDP $\mathcal{N}$ . So, when reaching $\mathit{trap}$ the accumulated weight is increased by $1$ before it is reduced in every other step until termination. That means that if a path starting in state $s$ or $t$ with weight $w$ does not terminate before reaching $\mathit{trap}$ , the termination time is $2(w+1)+3$ steps.

Now, let $\mathfrak{S}$ be a scheduler and denote the probability not to terminate before reaching $\mathit{trap}$ under $\mathfrak{S}$ by $p^{\mathfrak{S}}$ . For the expected termination time $T$ in $\mathcal{N}$ , we now have

[TABLE]

The summands $(1/2)^{i}(i+2(i+1))$ correspond to the probability to accumulated weight $i$ in the initial component which takes $i$ steps and the $2(i+1)$ steps needed to terminate by alternatingly leaving the weight unchanged and reducing it by $1$ . The three additional steps after $\mathit{trap}$ occur precisely with probability $p^{\mathfrak{S}}$ .

Not terminating before $\mathit{trap}$ corresponds exactly to not terminating at all in the MDP constructed for Corollary 4.5. The termination probability there is hence $1-p^{\mathfrak{S}}$ for any scheduler. It is hence possible to terminate with a probability less than $\vartheta$ in that MDP if and only if it is possible to reach an expected termination time of more than $10-3\vartheta$ in $\mathcal{N}$ . By Corollary 4.5 and the fact that termination is reached almost surely in $\mathcal{N}$ under any scheduler, we hence conclude:

Corollary 4.9.

Let $\mathcal{M}$ be a one-counter MDP with initial state $s_{\mathit{\scriptscriptstyle init}}$ that terminates almost surely under any scheduler, let $\vartheta$ be a rational, and let $T$ be the random variable assigning the termination time to runs. The Positivity problem is polynomial-time reducible to the problem whether

[TABLE]

The analogous argument with similar changes to the MDP used in the proof of Theorem 4.1 can be used to show the analogous result for the problem whether $\mathbb{E}^{\min}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(T)<\vartheta$ .

4.2 Partial and conditional stochastic shortest path problems

Our next goal is to prove that the partial and conditional SSPPs are Positivity-hard. We start by providing a formal definition of the decision versions of these two problems.

Let $\mathcal{M}$ be an MDP with a designated set of terminal states $\mathit{Goal}$ . We define the random variable $\oplus\mathit{Goal}$ on maximal paths $\zeta$ of $\mathcal{M}$ :

[TABLE]

The objective in the partial SSPP is to maximize the expected value of $\oplus\mathit{Goal}$ which we call the partial expected accumulated weight, or partial expectation for short, i.e., to compute the value

[TABLE]

where the supremum ranges over all schedulers $\mathfrak{S}$ . The threshold problem asks, given a rational $\vartheta$ , whether

[TABLE]

Note that the minimization of the partial expectation can be reduced to the maximization by multiplying all weights in $\mathcal{M}$ with $-1$ .

The conditional expectation under a scheduler $\mathfrak{S}$ that reaches $\mathit{Goal}$ with positive probability is the value

[TABLE]

Again, we are interested in the maximal value

[TABLE]

where the supremum ranges over all schedulers $\mathfrak{S}$ with $\mathrm{Pr}^{\mathfrak{S}}_{\mathcal{M}}(\lozenge\mathit{Goal})>0$ . Consequently, the threshold problem asks for a given rational $\vartheta$ whether

[TABLE]

Again, multiplying all weights with $-1$ reduces the minimization of the conditional expectation to the maximization. Furthermore, given a further set of states $F$ , the problem to maximize $\mathbb{E}^{\mathfrak{S}}_{\mathcal{M}}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{Goal}\mid\lozenge F)$ among all schedulers $\mathfrak{S}$ that reach $F$ with positive probability can be reduced to the conditional SSPP in our formulation as shown in [BKKW17]222In [BKKW17], only MDPs with non-negative weights are considered. The reduction of [BKKW17], however, does not require the restriction to non-negative weights..

Partial SSPP.

In the sequel, we will provide a direct reduction from the Positivity problem to the partial SSPP using out modular approach via MDP-gadgets to prove the following result:

Theorem 4.10.

The Positivity problem is polynomial-time reducible to the decision version of the partial SSPP, i.e., the question whether

[TABLE]

for a given MDP $\mathcal{M}$ and a given rational $\vartheta$ .

Again, we split up the proof of the theorem into the construction of the MDP with the proof of the correctness of the encoding of the linear recurrence sequence and the computation of the threshold $\vartheta$ .

Proof of Theorem 4.10: construction of the MDP and correctness of the encoding of a linear recurrence sequence.

Let $k$ be a natural number and let $(u_{n})_{n\geq 0}$ be the linear recurrence sequence given by rationals $\alpha_{i}$ for $1\leq i\leq k$ and $\beta_{j}$ for $0\leq j\leq k{-}1$ via $u_{0}=\beta_{0}$ , …, $u_{k-1}=\beta_{k-1}$ and $u_{n+k}=\alpha_{1}u_{n+k-1}+\dots+\alpha_{k}u_{n}$ for all $n\geq 0$ . By Assumption 3.1, we can assume w.l.o.g. that $\sum_{i}|\alpha_{i}|<\frac{1}{4}$ and that $0\leq\beta_{j}<\frac{1}{4k^{2k+2}}$ for all $j$ .

We begin by constructing a gadget $\mathcal{P}_{\bar{\beta}}$ that encodes the initial values $\beta_{0},\dots,\beta_{k{-}1}$ . The gadget is depicted in Figure 10 and contains states $t$ , $s$ , $\mathit{goal}$ , and $\mathit{fail}$ . For each $0\leq j\leq k-1$ , it additionally contains states $x_{j}$ and $y_{j}$ . In state $x_{j}$ , there is one action enabled that leads to $\mathit{goal}$ with probability $\frac{1}{2k^{2(k-j)}}+\beta_{j}$ and to $\mathit{fail}$ otherwise. From state $y_{j}$ , $\mathit{goal}$ is reached with probability $\frac{1}{2k^{2(k-j)}}$ and $\mathit{fail}$ otherwise. In state $t$ , there is an action $\gamma_{j}$ leading to $x_{j}$ with weight $k-j$ for each $0\leq j\leq k-1$ . Likewise, in state $s$ there is an action $\delta_{j}$ leading to $y_{j}$ with weight $k{-}j$ for each $0\leq j\leq k-1$ .

We furthermore reuse the initial gadget $\mathcal{I}$ and the gadget encoding the linear recurrence relation $\mathcal{G}_{\bar{\alpha}}$ from the previous section. In the gadget $\mathcal{G}_{\bar{\alpha}}$ , we rename the absorbing state $\mathit{trap}$ to the terminal state $\mathit{goal}$ which is the target state for the partial SSPP. As before, we glue together the three gadgets $\mathcal{I}$ , $\mathcal{G}_{\bar{\alpha}}$ and $\mathcal{P}_{\bar{\beta}}$ at states $s$ , $t$ , and $\mathit{goal}$ . Let us call the full MDP that we obtain in this way $\mathcal{M}$ which is depicted in Figure 11. We denote the state space by $S$ .

The cumbersome choices of probability values lead to the following lemma showing the correct interplay between the gadgets constructed via straight-forward computations.

Lemma 4.11.

Consider the full MDP $\mathcal{M}$ . Let $0\leq j\leq k-1$ . Starting with weight $-(k{-}1){+}j$ in state $t$ or $s$ , action $\gamma_{j}$ and $\delta_{j}$ maximize the partial expectation. For positive starting weight, $\gamma$ and $\delta$ are optimal.

Proof.

Suppose action $\gamma_{i}$ is chosen in state $t$ when starting with weight $-(k-1)+j$ . So, state $x_{i}$ is reached with weight $-(k-1)+j+(k-i)=1+j-i$ . Then the partial expectation achieved from this situation is

[TABLE]

For $i>j$ this value is $\leq 0$ and hence $\gamma_{i}$ is certainly not optimal. For $i=j$ , we obtain a partial expectation of

[TABLE]

For $i<j$ , state $x_{i}$ is reached with weight $1+j-i\leq k$ . Further, $\beta_{i}\leq\frac{1}{4k^{2k+2}}$ and $\frac{1}{2k^{2(k-i)}}\leq\frac{1}{2k^{2(k-j)}\cdot k^{2}}$ . So, the partial expectation obtained via $\gamma_{i}$ is at most

[TABLE]

So, indeed action $\gamma_{j}$ maximizes the partial expectation among the actions $\gamma_{i}$ with $0\leq i\leq k-1$ when the accumulated weight in state $t$ is $-(k-1)+j$ . The argument for state $s$ is the same with $\beta_{i}=0$ for all $i$ . It is easy to see that for accumulated weight $-(k-1)+j$ with $0\leq j\leq k-1$ actions $\gamma$ or $\delta$ are not optimal in state $t$ or $s$ : If $\mathit{goal}$ is reached immediately, the weight is not positive and otherwise states $t$ or $s$ are reached with lower accumulated weight again. The values $\beta_{j}$ are chosen small enough such that also a switch from state $t$ to $s$ while accumulating negative weight does not lead to a higher partial expectation.

For positive accumulated weight $w$ , the optimal partial expectation when choosing $\gamma$ first is at least $\frac{3}{4}w$ by construction and the fact that a positive value can be achieved from any possible successor state via one of the actions $\gamma_{j}$ and $\delta_{j}$ with $0\leq j\leq k-1$ . Choosing $\gamma_{j}$ on the other hands results in a partial expectation of at most $(k+w)\cdot(\frac{1}{4k^{2k+2}}+\frac{1}{2k^{2}})$ which is easily seen to be less as $k\geq 2$ . ∎

For each weight $w$ , denote by $e(t,w)$ and $e(s,w)$ the optimal partial expectation when starting in state $t$ or $s$ with accumulated weight $w$ in $\mathcal{M}$ as if the respective state was reached from the initial state with weight $w$ and probability $1$ . For each weight $w\geq-k+1$ , denote by $d(w)$ the difference $e(t,w)-e(s,w)$ between these optimal partial expectation when starting in state $t$ and $s$ with weight $w$ . Comparing action $\gamma_{j}$ and $\delta_{j}$ for starting weight $-(k{-}1){+}j$ , we conclude that the difference between optimal values $d(-(k{-}1){+}j)$ is equal to $\beta_{j}$ , for $0\leq j\leq k-1$ .

The important fact we use next is that for partial expectations, the optimal values $e(r,w)$ for states $r\in S\setminus\{\mathit{goal}\}$ and starting weights $w\in\mathbb{Z}$ satisfies the optimality equation ( $\ast$ ) from Section 3.2 when setting $e(\mathit{goal},w)=w$ as already shown in [CFK*+*13a]:

[TABLE]

By the fact that $\mathcal{G}_{\bar{\alpha}}$ encodes the given linear recurrence relation as soon as $\gamma$ and $\delta$ are the optimal actions as shown in Section 3.2, we conclude the following lemma:

Lemma 4.12.

Consider the linear recurrence sequence $(u_{n})_{n\geq 0}$ given above by $\alpha_{1},\dots,\alpha_{k}$ and $\beta_{0},\dots,\beta_{k-1}$ and the MDP $\mathcal{M}$ constructed from this sequence. We have

[TABLE]

for all $n$ with the values $d(w)$ just defined.

Proof of Theorem 4.10: computation of the threshold $\vartheta$ .

Let us now consider a run of the MDP $\mathcal{M}$ . For any $w>0$ , state $c$ is reached with accumulated weight $w$ with positive probability. As before, an optimal scheduler has to decide whether the partial expectation when starting with weight $w$ is better in state $s$ or $t$ : Action $\tau$ is optimal in $c$ for accumulated weight $w$ if and only if $d(w)\geq 0$ . Once $t$ or $s$ is reached, the optimal actions are given by Lemma 4.11. Let $\mathfrak{S}$ be the scheduler that always chooses $\tau$ in $c$ and actions $\gamma,\gamma_{0},\dots,\gamma_{k-1},\delta,\dots$ as described in Lemma 4.11. We conclude that $\mathfrak{S}$ is optimal if and only if the given linear recurrence sequence is non-negative. The remaining step is hence in our reduction is hence to prove that the partial expectation under $\mathfrak{S}$ is rational and can be computed in polynomial time:

Lemma 4.13.

Let $\mathfrak{S}$ be the scheduler for the constructed MDP $\mathcal{M}$ always choosing $\tau$ in $c$ and actions $\gamma,\gamma_{0},\dots,\gamma_{k-1},\delta,\dots$ as described in Lemma 4.11. The value $\mathbb{PE}^{\mathfrak{S}}_{\mathcal{M}}$ is rational and computable in polynomial time.

Proof.

Recall that the scheduler $\mathfrak{S}$ chooses $\gamma$ and $\delta$ , respectively, as long as the accumulated weight is positive. For an accumulated weight of $-(k-1)+j$ for $0\leq j\leq k-1$ , it chooses actions $\gamma_{j}$ and $\delta_{j}$ , respectively.

Analogously to the proof of Lemma 4.4, we want to recursively express the partial expectations under $\mathfrak{S}$ starting from $t$ or $s$ with some positive accumulated weight $n\in\mathbb{N}$ which we again denote by $e(t,n)$ and $e(s,n)$ , respectively. In order to do so, we reuse the following Markov chain $\mathcal{C}$ from Lemma 4.4 also depicted in Figure 7 which we briefly recall here: The Markov chain $\mathcal{C}$ has $5k$ states named $t_{-k+1}$ , …, $t_{+k}$ , $s_{-k+1}$ , …, $s_{+k}$ , and $\mathit{goal}_{+1}$ , …, $\mathit{goal}_{+k}$ . States $t_{-k+1}$ , …, $t_{0}$ , $s_{-k+1}$ , …, $s_{0}$ , and $\mathit{goal}_{+1}$ , …, $\mathit{goal}_{+k}$ are absorbing. For $0<i,j\leq k$ , there are transitions from $t_{+i}$ to $t_{+i-j}$ with probability $\alpha_{j}$ if $\alpha_{j}>0$ , to $s_{+i-j}$ with probability $|\alpha_{j}|$ if $\alpha_{j}<0$ , and to $\mathit{goal}_{+i}$ with probability $1-|\alpha_{1}|-\ldots-|\alpha_{k}|$ . Transitions from $s_{+i}$ are defined analogously.

The idea behind this Markov chain is that the reachability probabilities describe how, for arbitrary $n\in\mathbb{N}$ and $1\leq i\leq k$ , the values $e(t,nk+i)$ and $e(s,nk+i)$ depend on $n$ and the values $e(t,(n-1)k+j)$ and $e(s,(n-1)k+j)$ for $1\leq j\leq k$ . The transitions in $\mathcal{C}$ behave as $\gamma$ and $\delta$ in $\mathcal{M}$ , but the decrease in the accumulated weight is explicitly encoded into the state space. Namely, for $n\in\mathbb{N}$ and $0<i\leq k$ , we have

[TABLE]

and analogously for $e(s,nk+i)$ . We now group the optimal values together in the following column vectors

[TABLE]

for $n\in\mathbb{N}$ . In other words, this vector contains the optimal values for the partial expectation when starting in $t$ or $s$ with an accumulated weight from $\{nk+1,\dots,nk+k\}$ . Further, we define the vector containing the optimal values for weights in $\{-k+1,\dots,0\}$ which are the least values of accumulated weight reachable under scheduler $\mathfrak{S}$ .

[TABLE]

As we have seen, these values are given as follows:

[TABLE]

for $0\leq j\leq k-1$ .

As the reachability probabilities in $\mathcal{C}$ are rational and computable in polynomial time, we conclude from (4.2) that there are a matrix $A\in\mathbb{Q}^{2k\times 2k}$ , and vectors $a$ and $b$ in $\mathbb{Q}^{2k}$ computable in polynomial time such that

[TABLE]

for all $n\in\mathbb{N}$ . We claim that the following explicit representation for $n\geq-1$ satisfies this recursion:

[TABLE]

We show this by induction: Clearly, this representation yields the correct value for $v_{-1}$ . So, assume $v_{n}=A^{n+1}v_{-1}+\sum_{i=0}^{n}(n-i)A^{i}a+\sum_{i=0}^{n}A^{i}b$ . Then,

[TABLE]

So, we have an explicit representation for $v_{n}$ . The value we are interested in is

[TABLE]

Let $c=(\frac{1}{2^{k}},\frac{1}{2^{k-1}},\dots,\frac{1}{2^{1}},0,\dots,0)$ . Then,

[TABLE]

Hence, we can write

[TABLE]

We claim that all of the matrix series involved converge to rational matrices. As in the proof of Lemma 4.4, we observe that the maximal row sum in $A$ is at most $|\alpha_{1}|{+}\ldots{+}|\alpha_{k}|<1$ because the rows of the matrix contain exactly the probabilities to reach $t_{0}$ , … $t_{-k+1}$ , $s_{0}$ , …, and $s_{-k+1}$ from a state $t_{+i}$ or $s_{+i}$ in $\mathcal{C}$ for $1\leq i\leq k$ . But the probability to reach $\mathit{goal}_{+i}$ from these states is already $1{-}|\alpha_{1}|{-}\ldots{-}|\alpha_{k}|$ . Hence, $\|A\|_{\infty}$ , the operator norm induced by the maximum norm $\|\cdot\|_{\infty}$ , which equals $\max_{i}\sum_{j=1}^{2k}|A_{ij}|$ , is less than $1$ . So, of course also $\|\frac{1}{2^{k}}A\|_{\infty}<1$ and hence the Neumann series $\sum_{n=0}^{\infty}(\frac{1}{2^{k}}A)^{n}$ converges to $(I_{2k}-\frac{1}{2^{k}}A)^{-1}$ where $I_{2k}$ is the identity matrix of size $2k{\times}2k$ . So,

[TABLE]

Note that $\|A\|_{\infty}<1$ also implies that $I_{2k}-A$ is invertible. We observe that for all $n$ ,

[TABLE]

which is shown by straight-forward induction. Therefore,

[TABLE]

Finally, we show by induction that

[TABLE]

This is equivalent to

[TABLE]

For $n=0$ , both sides evaluate to [math]. So, we assume the claim holds for $n$ .

[TABLE]

The remaining series is the following:

[TABLE]

We conclude that all expressions in the representation of ${\mathbb{PE}}^{\mathfrak{S}}_{\mathcal{M}}$ above are rational and computable in polynomial time. ∎

As we have seen, the originally given linear recurrence sequence contains a negative member if and only if the scheduler $\mathfrak{S}$ is not optimal. This is the case if and only if ${\mathbb{PE}}^{\max}_{\mathcal{M}}>{\mathbb{PE}}^{\mathfrak{S}}_{\mathcal{M}}$ for the MDP $\mathcal{M}$ constructed from the linear recurrence sequence in polynomial time above. This finishes the proof of Theorem 4.10.

Conditional SSPP.

For the Positivity-hardness of the threshold problem for conditional expectations, we provide a reduction from the threshold problem for partial expectations in the following lemma. Note that a reduction in the other direction is provided in [PB19] rendering the two problems polynomial-time inter-reducible.

Lemma 4.14.

The threshold problems for the partial SSPP is polynomial-time reducible to the threshold problem of the conditional SSPP.

Proof.

Let $\mathcal{M}$ be an MDP with a designated terminal target state $\mathit{goal}$ and let $\vartheta$ be a rational number. We construct an MDP $\mathcal{N}$ such that ${\mathbb{PE}}^{\max}_{\mathcal{M}}>\vartheta$ if and only if ${\mathbb{CE}}^{\max}_{\mathcal{N}}>\vartheta$ . We obtain $\mathcal{N}$ by adding a new initial state $s_{\mathit{\scriptscriptstyle init}}^{\prime}$ , renaming the state $\mathit{goal}$ to $\mathit{goal}^{\prime}$ , and adding a new state $\mathit{goal}$ to $\mathcal{M}$ . In $s_{\mathit{\scriptscriptstyle init}}^{\prime}$ , one action with weight [math] is enabled leading to the old initial state $s_{\mathit{\scriptscriptstyle init}}$ and to $\mathit{goal}$ with probability $1/2$ each. From $\mathit{goal}^{\prime}$ there is one new action leading to $\mathit{goal}$ with probability $1$ and weight $+\vartheta$ .

Each scheduler $\mathfrak{S}$ for $\mathcal{M}$ can be seen as a scheduler for $\mathcal{N}$ and vice versa. Now, we observe that for any scheduler $\mathfrak{S}$ ,

[TABLE]

Hence, ${\mathbb{PE}}^{\max}_{\mathcal{M}}>\vartheta$ if and only if ${\mathbb{CE}}^{\max}_{\mathcal{N}}>\vartheta$ . ∎

Together with the Positivity-hardness of the threshold problem for partial expectations (Theorem 4.10), we conclude:

Theorem 4.15.

The Positivity problem is reducible in polynomial time to the following problem: Given an MDP $\mathcal{M}$ and a rational $\vartheta$ , decide whether ${\mathbb{CE}}^{\max}_{\mathcal{M}}>\vartheta$ .

Two-sided partial SSPP

To conclude this section, we prove the Positivity-hardness of a two sided version of the partial SSPP with two non-negative weight functions. This result will form the basis to further Positivity-hardness results in the subsequent section. The key idea is that, instead of using arbitrary integer weights, we can simulate the non-monotonic behavior of the accumulated weight along a path in the partial SSPP with arbitrary weights with two non-negative weight functions. In the definition of the random variable $\oplus\mathit{Goal}$ , we can replace the choice that paths not reaching $\mathit{Goal}$ are assigned weight [math] by a second weight function. Let $\mathcal{M}=(S,\mathit{Act},\mathrm{Pr},s_{\mathit{\scriptscriptstyle init}},\mathit{wgt}_{\mathit{goal}},\mathit{wgt}_{\mathit{fail}},\mathit{goal},\mathit{fail})$ be an MDP with two designated terminal states $\mathit{goal}$ and $\mathit{fail}$ and two non-negative weight functions $\mathit{wgt}_{\mathit{goal}}\colon S\times\mathit{Act}\to\mathbb{N}$ and $\mathit{wgt}_{\mathit{fail}}\colon S\times\mathit{Act}\to\mathbb{N}$ . Assume that the probability $\mathrm{Pr}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}^{\min}(\lozenge\{\mathit{goal},\mathit{fail}\})=1$ . Define the following random variable $X$ on maximal paths $\zeta$ :

[TABLE]

Due to the assumption that $\mathit{goal}$ or $\mathit{fail}$ is reached almost surely under any scheduler, the expected value $\mathbb{E}^{\mathfrak{S}}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(X)$ is well-defined for all schedulers $\mathfrak{S}$ for $\mathcal{M}$ . We call the value $\mathbb{E}^{\max}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(X)=\sup_{\mathfrak{S}}\mathbb{E}^{\mathfrak{S}}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(X)$ the optimal two-sided partial expectation. We can show that the threshold problem for the two-sided partial expectation is Positivity-hard as well by a small adjustment of the construction above.

Theorem 4.16.

The Positivity problem is polynomial-time reducible to the following problem: Given an MDP $\mathcal{M}=(S,\mathit{Act},\mathrm{Pr},s_{\mathit{\scriptscriptstyle init}},\mathit{wgt}_{\mathit{goal}},\mathit{wgt}_{\mathit{fail}},\mathit{goal},\mathit{fail})$ as above and a rational $\vartheta$ , decide whether $\mathbb{E}^{\max}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(X)>\vartheta$ .

Proof.

Given the parameters $\alpha_{1},\dots,\alpha_{k}$ and $\beta_{0},\dots,\beta_{k-1}$ of a rational linear recurrence sequence, we can construct an MDP $\mathcal{M}^{\prime}=(S,\mathit{Act},\mathrm{Pr},s_{\mathit{\scriptscriptstyle init}},\mathit{wgt},\mathit{goal},\mathit{fail})$ with one weight function $\mathit{wgt}\colon S\times\mathit{Act}\to\mathbb{Z}$ similar to the MDP $\mathcal{M}$ depicted in Figure 11. W.l.o.g., we again assume that $\sum_{i}|\alpha_{i}|<\frac{1}{4}$ and that $0\leq\beta_{j}<\frac{1}{4k^{2k+2}}$ for all $j$ . The initial gadget and the gadget $\mathcal{G}_{\bar{\alpha}}$ are as before. The gadget $\mathcal{P}_{\bar{\beta}}$ , however, is slightly modified and replaced by the gadget $\mathcal{T}_{\bar{\beta}}$ depicted in Figure 12. For this gadget, we define $\alpha=\sum_{i=1}^{k}|\alpha_{i}|$ , $p_{1}=(1-\alpha)(\frac{1}{2k^{2(k-j)}}+\beta_{j})$ , $p_{2}=(1-\alpha)(1-(\frac{1}{2k^{2(k-j)}}+\beta_{j}))$ , $q_{1}=(1-\alpha)\frac{1}{2k^{2(k-j)}}$ , and $q_{2}=(1-\alpha)(1-\frac{1}{2k^{2(k-j)}})$ . With the transitions as in the figure, the probability to reach $\mathit{goal}$ or $\mathit{fail}$ and the weight accumulated does not change when choosing action $\gamma_{j}$ or $\delta_{j}$ compared to the gadget $\mathcal{P}_{\bar{\beta}}$ . The only difference is that the expected time to reach $\mathit{goal}$ or $\mathit{fail}$ changes. The steps alternate between probability $\alpha$ and probability [math] to reach $\mathit{goal}$ or $\mathit{fail}$ – just as in the gadget $\mathcal{G}_{\bar{\alpha}}$ . In this way, it makes no difference for the expected time before reaching $\mathit{goal}$ or $\mathit{fail}$ when a scheduler stops choosing $\gamma$ and $\delta$ . We can, in fact, compute the expected time $T$ to reach $\mathit{goal}$ or $\mathit{fail}$ from $s_{\mathit{\scriptscriptstyle init}}$ under any scheduler quite easily: Reaching $t$ or $s$ takes $3$ steps in expectation. Afterwards, two further steps are taken $1/\alpha$ -many times in expectation. So,

[TABLE]

The optimal scheduler $\mathfrak{S}$ for the partial expectation in $\mathcal{M}^{\prime}$ is the same as in the MDP $\mathcal{M}$ above. Also the value $\vartheta$ of this scheduler can be computed as in Theorem 4.13. So, $\mathbb{PE}^{\max}_{\mathcal{M}^{\prime},s_{\mathit{\scriptscriptstyle init}}}>\vartheta$ if and only if the given linear recurrence sequence is eventually negative.

Note that all weights in $\mathcal{M}^{\prime}$ are $\geq-k$ . We define two new weight functions to obtain an MDP $\mathcal{N}$ from $\mathcal{M}^{\prime}$ : We let $\mathit{wgt}_{\mathit{goal}}(s,\alpha)=\mathit{wgt}(s,\alpha)+k$ and $\mathit{wgt}_{\mathit{fail}}(s,\alpha)=+k$ for all $(s,\alpha)\in S\times\mathit{Act}$ . Both weight functions take only non-negative integer values.

Any scheduler $\mathfrak{S}$ for $\mathcal{M}^{\prime}$ can be viewed as a scheduler for $\mathcal{N}$ , and vice versa, as the two MDPs only differ in the weight functions. Further, we observe that for each maximal path $\zeta$ ending in $\mathit{goal}$ or $\mathit{fail}$ in $\mathcal{M}^{\prime}$ and at the same time in $\mathcal{N}$ , we have $X(\zeta)=\oplus\mathit{goal}(\zeta)+k\cdot\mathit{length}(\zeta)$ . (Recall that $\oplus\mathit{goal}(\zeta)$ equals $\mathit{wgt}(\zeta)$ if $\zeta$ reaches $\mathit{goal}$ and [math] if $\zeta$ reaches $\mathit{fail}$ .) As the expected time before $\mathit{goal}$ or $\mathit{fail}$ is reached is constant, namely $T$ under any scheduler, it follows that for all schedulers $\mathfrak{T}$ we have

[TABLE]

Therefore, $\mathbb{E}^{\max}_{\mathcal{N},s_{\mathit{\scriptscriptstyle init}}}(X)>\vartheta+k\cdot T$ if and only if the given linear recurrence sequence eventually becomes negative. ∎

While the two-sided partial expectation is certainly interesting in its own right, it will also play an important role in the proof of the Positivity-hardness of the threshold problem for the optimal long-run probability of a regular co-safety property in the next section.

4.3 Long-run probabilities and frequency-LTL

We now turn our attention to problems concerning the long-run satisfaction of path properties. On first sight, these problems seem to be of quite a different nature compared to the problems addressed so far. Long-run probabilities are an average of the probability after each step that a property $\varphi$ is satisfied on the suffix of an execution starting after that step. Similarly, frequency-LTL allows us to express that the long-run fraction of suffixes of an execution satisfying $\varphi$ lies above a threshold. Nevertheless, upon closer inspection, the two-sided version of partial expectations considered at the end of the previous section actually shares similarities with these long-run satisfaction problems. This allows us to provide a reduction from the threshold problem for two-sided partial expectations to the threshold problem for long-run probabilities of a simple fixed co-safety property. We begin by formally defining long-run probabilities.

Long-run probability.

Let $\mathcal{M}=(S,\mathit{Act},P,s_{\mathit{\scriptscriptstyle init}},\mathsf{AP},\mathsf{L})$ be an MDP and let $\varphi$ be a path property. The long-run probability of $\varphi$ on an infinite path $\zeta$ under a scheduler $\mathfrak{S}$ for $\mathcal{M}$ is defined as as the long-run average of the probabilities for $\varphi$ in all positions of $\zeta$ with respect to the residual schedulers $\mathfrak{S}{\uparrow}{\zeta[0\dots i]}$ defined by

[TABLE]

for finite paths $\pi$ starting in $\zeta[i]$ :

[TABLE]

The long-run probability of property $\varphi$ under scheduler $\mathfrak{S}$ from state $s$ , denoted $\mathbb{LP}^{\mathfrak{S}}_{\mathcal{M},s}(\varphi)$ , is defined as the expectation of the random variable $\zeta\mapsto\mathit{lrp}^{\mathfrak{S}}_{\scriptscriptstyle\varphi}(\zeta)$ under $\mathfrak{S}$ with starting state $s$ :

[TABLE]

We now address the task to compute the extremal long-run probabilities for $\varphi$ :

[TABLE]

where $\mathfrak{S}$ ranges over all schedulers for $\mathcal{M}$ . In contrast to classical optimization problems for MDPs, the random variable whose expectation we aim to optimize, namely $\mathit{lrp}^{\mathfrak{S}}_{\scriptscriptstyle\varphi}$ , depends on the scheduler $\mathfrak{S}$ itself.

Example 4.17.

To illustrate the notion of long-run probability, consider the following example, which is a simplification of an example from [BBPS19]. Let $\mathcal{N}$ be the MDP shown in Fig. 13. The only non-deterministic choice is the choice between actions $\alpha$ and $\beta$ in state $a$ . Action $\alpha$ yields a uniform distribution over the three successors.

We want to determine the maximal long-run probability of $a{\mathrm{U}}b$ . Under the memoryless scheduler $\mathfrak{S}_{\alpha}$ that always picks action $\alpha$ , the probability of $a{\mathrm{U}}b$ in the $a$ -state is $\frac{1}{2}$ under this scheduler. The states $b_{1}$ and $c_{1}$ appear equally often. The probability of $a{\mathrm{U}}b$ is $1$ in state $b_{1}$ and [math] in state $c_{1}$ . We thus conclude that the long-run probability under $\mathfrak{S}_{\alpha}$ is $\frac{1}{2}$ . Similarly, the steady-state probability of the states $a$ and $b_{2}$ under the memoryless scheduler $\mathfrak{S}_{\beta}$ are $\frac{1}{4}$ , and the probability that $a{\mathrm{U}}b$ holds from there is $1$ . The long-run probability of $a{\mathrm{U}}b$ under $\mathfrak{S}_{\beta}$ equals $\frac{1}{2}$ as well. Interestingly, these two memoryless schedulers are not optimal. Consider the scheduler $\mathfrak{S}$ that chooses $\alpha$ first and, if it returns to $a$ directly, chooses $\beta$ afterwards. In the first visit to the $a$ state, the probability for $a{\mathrm{U}}b$ is $\frac{2}{3}$ . States $b_{1}$ and $c_{1}$ are reached with probability $1/3$ afterwards. If state $a$ is reached again directly, the probability of $a{\mathrm{U}}b$ is now $1$ . Also state $b_{2}$ is reached with probability $1/3$ before returning to $a$ from $b_{1}$ , $c_{1}$ , or $c_{2}$ . Tho compute the long-run probability under this scheduler, we sum up the satisfaction probabilities for all states that can be visited before returning to $a$ from $b_{1}$ , $c_{1}$ , or $c_{2}$ multiplied with the probabilities of the visits. We divide the result by the expected number of steps before returning. Note that we sum up probability $2/3+1/3\cdot 1$ for the two possible visits to state $a$ . We obtain a long-run probability of

[TABLE]

Preliminary results.

Before we prove the Positivity-hardness result for long-run probabilities, we need some results on the connection between long-run probabilities and mean payoffs. The results are presented in [Pir21] and we briefly state the necessary results in the sequel.

We will prove that the threshold problem for the maximal long-run probability is Positivity-hard already for very simple co-safety properties $\varphi$ . To represent regular co-safety properties, we will use deterministic finite-automata (DFA). A run $\zeta$ satisfies the represented co-safety property if some prefix of $L(\zeta)$ is accepted by the DFA.

In [Pir21], a construction is provided that allows us to express the optimal long-run probability of regular co-safety properties in terms of expected mean payoffs. We briefly present this reduction and the corresponding results here: The main idea is to construct an MDP with extended state space that keeps track of the number of runs currently in each state of the automaton. We can then assign a weight to each step of the MDP depending on how many runs of the DFA enter an accepting state during that step. The optimal mean payoff in the constructed MDP then coincides with the optimal long-run probability in the original MDP. However, there is no bound on the number of runs we have to store in the state space for this construction. Therefore, the constructed MDP will have an infinite state space.

Let $\mathcal{M}=(S,\mathit{Act},P,s_{\mathit{\scriptscriptstyle init}},\mathsf{AP},L)$ be a strongly connected MDP and let $\mathcal{D}=(Q,2^{\mathsf{AP}},\delta,q_{0},F)$ be a DFA over $\mathsf{AP}$ . As we are interested in the co-safety property given by $\mathcal{D}$ , only runs of $\mathcal{D}$ up to the first accepting state are relevant. Hence, we can collapse all accepting states of $\mathcal{D}$ to one absorbing state $\mathit{accept}$ and all states form which $\mathit{accept}$ is not reachable to one state $\mathit{reject}$ . Let the set of states $Q=\{q_{0},q_{1},\dots,q_{\ell},\mathit{accept},\mathit{reject}\}$ for some $\ell\in\mathbb{N}$ .

We construct a weighted infinite-state MDP $\mathcal{M}_{\mathcal{D}}=(S^{\prime},\mathit{Act},P^{\prime},s_{\mathit{\scriptscriptstyle init}}^{\prime},\mathit{wgt})$ in the sequel. The state space is

[TABLE]

The $\ell+1$ natural numbers in a state store the number of runs of $\mathcal{D}$ on suffixes of the path produced by the MDP so far that are in the respective state of $\mathcal{D}$ . The actions $\mathit{Act}$ are the same as in $\mathcal{M}$ . For the transition probability function $P^{\prime}$ we define the following: Let $s^{\prime}=(s,n_{0},\dots,n_{\ell})$ and $t^{\prime}=(t,m_{0},\dots,m_{\ell})$ be states such that for all $i$ ,

[TABLE]

where $\iota_{i}=1$ if $i=0$ and $\iota_{i}=0$ otherwise. For such states, we set

[TABLE]

All other transition probabilities are [math]. The weight function does not work on state-weight pairs as usual, but on single transitions in $S^{\prime}\times\mathit{Act}\times S^{\prime}$ . For a transition $(s^{\prime},\alpha,t^{\prime})$ with $s^{\prime}=(s,n_{0},\dots,n_{\ell})$ and $t^{\prime}=(t,m_{0},\dots,m_{\ell})$ , the weight is defined by

[TABLE]

To obtain a weight function on state-weight pairs, one could now take the weighted average over all possible transitions that can be taken via a state action-pair. As we will be interested in the mean payoff under this weight function, this change would not influence the subsequent considerations. The initial state $s_{\mathit{\scriptscriptstyle init}}^{\prime}$ is $(s_{\mathit{\scriptscriptstyle init}},1,0,\dots,0)$ .

We observe that the sum of the entries in the last $\ell+1$ components increases by at most $1$ in each step. Hence, the total accumulated weight after $n$ steps along any path is bounded by $n$ and we can already conclude that the mean payoff in $\mathcal{M}_{\mathcal{D}}$ is bounded by $1$ along each path.

A scheduler for $\mathcal{M}$ can be used as a scheduler for $\mathcal{M}_{\mathcal{D}}$ and vice versa as transitions in $\mathcal{M}_{\mathcal{D}}$ are uniquely defined by the transitions in the $\mathcal{M}$ component. If we consider a scheduler $\mathfrak{S}$ for both $\mathcal{M}$ and $\mathcal{M}_{\mathcal{D}}$ , there is, however, one caveat: If $\mathfrak{S}$ is a finite memory-scheduler for $\mathcal{M}_{\mathcal{D}}$ , the same scheduler is not necessarily a finite-memory scheduler for $\mathcal{M}$ . So, we want to emphasize that the following lemma states that the maximal mean payoff in $\mathcal{M}_{\mathcal{D}}$ can be approximated by schedulers that are still finite-memory schedulers when considered as schedulers for $\mathcal{M}$ .

Lemma 4.18 ([Pir21]).

Let $\mathcal{M}$ and $\mathcal{D}$ be given as above and let $\mathcal{M}_{\mathcal{D}}$ be the constructed MDP. For each scheduler $\mathfrak{T}$ for $\mathcal{M}_{\mathcal{D}}$ and each $\varepsilon>0$ , there is a finite-memory scheduler $\mathfrak{F}$ for $\mathcal{M}$ such that, if $\mathfrak{F}$ is seen as a scheduler for $\mathcal{M}_{\mathcal{D}}$ :

[TABLE]

For finite-memory schedulers, the mean-payoff in $\mathcal{M}_{\mathcal{D}}$ and the long-run probability of the co-safety property given by $\mathcal{D}$ in $\mathcal{M}$ coincide:

Lemma 4.19 ([Pir21]).

Let $\mathcal{M}$ and $\mathcal{D}$ be given as above and let $\mathcal{M}_{\mathcal{D}}$ be the constructed MDP. Then, for each finite-memory scheduler $\mathfrak{S}$ for $\mathcal{M}$ (also viewed as a scheduler for $\mathcal{M}_{\mathcal{D}}$ ), we have $\mathbb{LP}^{\mathfrak{S}}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\mathcal{D})=\mathbb{E}^{\mathfrak{S}}_{\mathcal{M}_{\mathcal{D}},s_{\mathit{\scriptscriptstyle init}}^{\prime}}(\mathit{MP)}$ .

As shown in [Pir21], using Fatou’s lemma, one can show that also the optimal long-run probability can be approximated by finite-memory schedulers:

Lemma 4.20 ([Pir21]).

Let $\mathcal{M}$ and $\varphi$ be given as above. For each scheduler $\mathfrak{T}$ for $\mathcal{M}$ and each $\varepsilon>0$ , there is a finite-memory scheduler $\mathfrak{F}$ for $\mathcal{M}$ such that:

[TABLE]

It follows that indeed, the optimal long-run probability can be expressed as an optimal expected mean payoff via the provided reduction.

Theorem 4.21 ([Pir21]).

Let $\mathcal{M}$ and $\mathcal{D}$ be as above. Let $\mathcal{M}_{\mathcal{D}}$ be the infinite-state MDP constructed from $\mathcal{M}$ and $\mathcal{D}$ as described above. Then,

[TABLE]

Positivity-hardness.

So far, we have seen that the optimal long-run probability of a regular co-safety can be expressed in terms of an optimal expected mean-payoff. This insight allows us to draw a connection between long-run probabilities and two sided-partial expectations that we just discussed. Recall that for an MDP $\mathcal{M}=(S,\mathit{Act},\mathrm{Pr},s_{\mathit{\scriptscriptstyle init}},\mathit{wgt}_{\mathit{goal}},\mathit{wgt}_{\mathit{fail}},\mathit{goal},\mathit{fail})$ with two designated absorbing states $\mathit{goal}$ and $\mathit{fail}$ and two non-negative weight functions $\mathit{wgt}_{\mathit{goal}}\colon S\times\mathit{Act}\to\mathbb{N}$ and $\mathit{wgt}_{\mathit{fail}}\colon S\times\mathit{Act}\to\mathbb{N}$ , the two-sided partial expectation was defined as the expectation of the following random variable $X$ on maximal paths $\zeta$ :

[TABLE]

We now exploit the construction of $\mathcal{M}_{\mathcal{D}}$ provided above to mimic a behavior similar to the payoff according to the random variable $X$ . Consider the DFA $\mathcal{D}$ depicted in Figure 14. The state space is $Q=\{q_{{\mathit{\scriptscriptstyle init}}},q_{1},q_{2},\mathit{accept},\mathit{reject}\}$ . The alphabet is $2^{\{a,b,c,\mathit{goal},\mathit{fail}\}}$ . From the initial state letters satisfying $a\land b\land\neg c$ lead to $q_{1}$ , letters satisfying $a\land c\land\neg b$ to $q_{2}$ and all remaining letters to $\mathit{reject}$ . From $q_{1}$ , letters satisfying $a\land\neg\mathit{goal}$ lead back to $q_{1}$ , letters with $\mathit{goal}\land\neg a$ to $\mathit{accept}$ , and all remaining letters lead to $\mathit{reject}$ . Transitions from $q_{2}$ are defined analogously with $\mathit{goal}$ replaced by $\mathit{fail}$ .

Consider a run $\rho$ of an MDP $\mathcal{M}$ labeled with $\{a,b,c,\mathit{goal},\mathit{fail}\}$ for which we keep counters of the number of runs on suffixes of $\rho$ in each of the states of $\mathcal{D}$ : We only need counters $c_{1}$ and $c_{2}$ for states $q_{1}$ and $q_{2}$ as these are the only states multiple runs can be in before being accepted or rejected. The update of the counters in the MDP $\mathcal{M}_{\mathcal{D}}$ can directly be determined from the DFA $\mathcal{D}$ : E.g., if $\{a,b\}$ is read, counter $c_{1}$ is increased; if $\{a,c\}$ is read, counter $c_{2}$ is increased. On $\{a\}$ , both counters stay the same. If no $a$ is read, the counters are reset to [math]. If at the same time $\mathit{goal}$ is read, the value of $c_{1}$ is received as weight. If $\mathit{fail}$ is read, the value of $c_{2}$ is received as weight. So, the behavior of the counters is very similar to the accumulation of two non-negative weight functions. Which of the two weight functions or the two counters is used to determine the payoff depends on whether $\mathit{goal}$ or $\mathit{fail}$ is reached next. In the sequel, we will show that indeed already the fixed co-safety property of this simple DFA $\mathcal{D}$ suffices to prove Positivity-hardness of the threshold problem for long-run probabilities.

The proof of the Positivity-hardness of the threshold problem for the two-sided partial expectation with non-negative weights contains most of the necessary ingredients we need: Let $(u_{n})_{n\geq 0}$ be a rational linear recurrence sequence given by initial values $\beta_{0},\dots,\beta_{k-1}$ and the coefficients $\alpha_{1},\dots,\alpha_{k}$ of the recurrence. In the proof of Theorem 4.16, we showed that we can construct an MDP $\mathcal{M}=(S,\mathit{Act},\mathrm{Pr},s_{\mathit{\scriptscriptstyle init}},\mathit{wgt}_{\mathit{goal}},\mathit{wgt}_{\mathit{fail}},\mathit{goal},\mathit{fail})$ and rationals $\vartheta$ , $T$ with the following properties from the given parameters:

•

For the two designated states $\mathit{goal}$ and $\mathit{fail}$ , we have $\mathrm{Pr}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}^{\min}(\lozenge\{\mathit{goal},\mathit{fail}\})=1$ .

•

The expected number of steps until $\mathit{goal}$ or $\mathit{fail}$ is reached is $T$ under any scheduler.

•

The weight functions $\mathit{wgt}_{\mathit{goal}}$ and $\mathit{wgt}_{\mathit{fail}}$ assign a weight between [math] and $2k$ to each state-action pair.

•

$\mathbb{E}^{\max}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(X)>\vartheta$ if and only if there is an $n$ with $u_{n}<0$ .

From this MDP $\mathcal{M}$ , we now construct a labeled MDP $\mathcal{K}$ . For each state-action pair $(s,\alpha)$ of $\mathcal{M}$ with $s\not\in\{\mathit{goal},\mathit{fail}\}$ , we add a chain $r_{s,\alpha}^{1},\dots,r_{s,\alpha}^{4k}$ of new states as depicted in Figure 15: We redirect the transition from $s$ when choosing $\alpha$ to this chain by setting $P_{\mathcal{K}}(s,\alpha,r_{s,\alpha}^{1})$ . In the states of the chain only one action $\tau$ is enabled. The process moves through the chain with probability $1$ via this action, i.e., $P_{\mathcal{K}}(r_{s,\alpha}^{i},\tau,r_{s,\alpha}^{i+1})=1$ for all $i<4k$ . Then, the original transition is taken from the state $r_{s,\alpha}^{4k}$ by setting $P_{\mathcal{K}}(r_{s,\alpha}^{4k},\tau,t)=P(s,\alpha,t)$ for all states $t$ of $\mathcal{M}$ . Instead of making $\mathit{goal}$ and $\mathit{fail}$ absorbing, we furthermore add transitions back to the initial state $s_{\mathit{\scriptscriptstyle init}}$ from $\mathit{goal}$ and $\mathit{fail}$ with probability $1$ . Note that the expected time from $s_{\mathit{\scriptscriptstyle init}}$ until $s_{\mathit{\scriptscriptstyle init}}$ is reached again from $\mathit{goal}$ or $\mathit{fail}$ is now $T^{\prime}=T(4k+1)+1$ in $\mathcal{K}$ under any scheduler.

The labeling is defined as follows: All states except for $\mathit{goal}$ and $\mathit{fail}$ are labeled with $a$ . States $\mathit{goal}$ and $\mathit{fail}$ are labeled with their names. Furthermore, in each of the chains $r_{s,\alpha}^{1},\dots,r_{s,\alpha}^{4k}$ , the first $\mathit{wgt}_{\mathit{goal}}(s,\alpha)$ of the states are labeled with $b$ in addition to the label $a$ . The next $\mathit{wgt}_{\mathit{fail}}(s,\alpha)$ states are labeled with $c$ in addition to the $a$ . As $\mathit{wgt}_{\mathit{goal}}(s,\alpha)+\mathit{wgt}_{\mathit{fail}}(s,\alpha)\leq 4k$ , this is always possible.

Consider now a path $\pi$ of $\mathcal{M}$ from $s_{\mathit{\scriptscriptstyle init}}$ to $\mathit{goal}$ or $\mathit{fail}$ . There is a unique corresponding path $\hat{\pi}$ in $\mathcal{K}$ . The counters induced by the DFA $\mathcal{D}$ as described above now behave exactly like the accumulation weight functions $\mathit{wgt}_{\mathit{goal}}$ and $\mathit{wgt}_{\mathit{fail}}$ . The value $c_{1}$ counting the number of runs in state $q_{1}$ of $\mathcal{D}$ is precisely $\mathit{wgt}_{\mathit{goal}}(\pi)$ when entering $\mathit{goal}$ or $\mathit{fail}$ as in each chain of states $r_{s,\alpha}^{1},\dots,r_{s,\alpha}^{4k}$ exactly $\mathit{wgt}_{\mathit{goal}}(s,\alpha)$ -many runs of $\mathcal{D}$ enter state $q_{1}$ . The counter $c_{2}$ behaves analogously in terms of the weight function $\mathit{wgt}_{\mathit{fail}}$ . As all states not in $\{\mathit{goal},\mathit{fail}\}$ are labeled with $a$ , the counters are also not reset. When entering $\mathit{goal}$ , the random variable $X$ assigns weight $\mathit{wgt}_{\mathit{goal}}(\pi)$ to the path $\pi$ . The same weight is received from the counter $c_{1}$ in this case. When entering $\mathit{fail}$ , weight $\mathit{wgt}_{\mathit{fail}}(\pi)$ is assigned by $X$ and received from the counters.

As the time required to reach $s_{\mathit{\scriptscriptstyle init}}$ again from $\mathit{goal}$ or $\mathit{fail}$ in $\mathcal{K}$ in expectation is $T^{\prime}$ under any scheduler, a scheduler maximizing the expected mean payoff in $\mathcal{K}_{\mathcal{D}}$ , i.e., according to the weight function induced by the counter for $\mathcal{D}$ , hence has to maximize the expected value of $X$ when considered as a scheduler for $\mathcal{M}$ . By Theorem 4.21, the maximal mean payoff in $\mathcal{K}_{\mathcal{D}}$ equals the maximal long-run probability $\mathbb{LP}^{\max}_{\mathcal{K},s_{\mathit{\scriptscriptstyle init}}}(\mathcal{D})$ . Putting these results together, we obtain that

[TABLE]

if and only if the given linear recurrence sequence is eventually negative. We conclude the Positivity-hardness result for long-run probabilities. Note that the Positivity-hardness holds for the fixed simple DFA $\mathcal{D}$ .

Theorem 4.22.

There is a fixed DFA $\mathcal{D}$ such that the Positivity problem is polynomial-time reducible to the following problem: Given an MDP $\mathcal{M}$ and a rational $\chi$ , decide whether $\mathbb{LP}^{\max}_{\mathcal{M}}(\mathcal{D})>\chi$ .

Frequency-LTL.

A consequence of this result is that model checking of frequency-LTL in MDPs is at least as hard as the Positivity problem. Whether the model-checking problem for the full logic frequency-LTL is decidable has been raised as an open question in [FK15, FKK15]. Our result now shows that proving decidability of this model-checking problem would render the Positivity problem decidable as well. The frequency-globally modality $G^{>\vartheta}_{\inf}(\varphi)$ of frequency-LTL is defined to hold on a path $\pi$ iff

[TABLE]

i.e. iff the long-run frequency of $\varphi$ exceeds $\vartheta$ .

Theorem 4.23.

There is a polynomial-time reduction from the Positivity problem to the following qualitative model checking problem for frequency-LTL for a fixed LTL-formula $\varphi$ : Given an MDP $\mathcal{M}$ and a rational $\vartheta$ , is $\mathrm{Pr}^{\max}_{\mathcal{M}}(G^{>\vartheta}_{\inf}(\varphi))=1$ ?

Proof.

Consider the MDP $\mathcal{K}$ , the DFA $\mathcal{D}$ , and the threshold $\vartheta^{\prime}={\vartheta}/{T^{\prime}}$ constructed above. As the sets of states labeled with $b$ and with $c$ are disjoint and included in the set of states labeled with $a$ , and likewise the sets of states labeled with $a$ , $\mathit{goal}$ , and $\mathit{fail}$ are pairwise disjoint in $\mathcal{K}$ , a path of $\mathcal{K}$ has a prefix accepted by $\mathcal{D}$ if and only if the path satisfies

[TABLE]

We claim that there is a scheduler $\mathfrak{S}$ with $\mathbb{LP}^{\mathfrak{S}}_{\mathcal{K},s_{\mathit{\scriptscriptstyle init}}}(\mathcal{D})>\vartheta^{\prime}$ if and only if there is a scheduler $\mathfrak{T}$ such that $G^{>\vartheta}_{\inf}(\varphi)$ holds with probability $1$ under $\mathfrak{T}$ in $\mathcal{K}$ .

Suppose there is a scheduler with $\mathfrak{S}$ with $\mathbb{LP}_{\mathcal{K}}^{\mathfrak{S}}(\mathcal{D})>\vartheta^{\prime}$ . By Lemma 4.20, we can assume that $\mathfrak{S}$ is a finite-memory scheduler as the maximal long-run probability can be approximated by finite-memory schedulers. As $\mathcal{K}$ is strongly connected, we can further assume that $\mathfrak{S}$ induces only one BSCC. We claim that under this scheduler $\mathfrak{S}$ also $G^{>\vartheta}_{\inf}(\varphi)$ holds with probability $1$ . For finite-memory schedulers, it is easy to check that the expected long-run probability equals the expected long-run frequency as we obtain a finite-state Markov chain: Let $x_{\mathfrak{s}}$ be the steady state probability of states $\mathfrak{s}$ enriched with memory modes in the single BSCC $\mathcal{B}^{\mathfrak{S}}$ induced by $\mathfrak{S}$ . Further, let $p_{\mathfrak{s}}$ be the probability that a run starting in $\mathfrak{s}$ under $\mathfrak{S}$ satisfies $\varphi$ . Then, $\mathbb{LP}_{\mathcal{K}}^{\mathfrak{S}}(\mathcal{D})=\sum_{\mathfrak{s}\in\mathcal{B}^{\mathfrak{S}}}x_{\mathfrak{s}}\cdot p_{\mathfrak{s}}$ . But the same expression also computes the expected frequency with which $\varphi$ holds on suffixes as shown in [FK15]. Furthermore, in a strongly connected Markov chain, the frequency of $\varphi$ along almost all paths agrees with the expected frequency (see [FK15]). So,

[TABLE]

holds on almost all paths $\zeta$ .

Conversely, if there is a scheduler $\mathfrak{T}$ such that $G^{>\vartheta}_{\inf}(\varphi)$ holds with probability $1$ under $\mathfrak{T}$ in $\mathcal{K}$ , the expected value $\mathbb{E}_{\mathcal{K}}^{\mathfrak{S}}(\liminf_{n\to\infty}\frac{1}{n+1}\sum_{i=0}^{n}\mathds{1}_{\varsigma[i\dots]\vDash\varphi})>\vartheta$ . By an argument using Fatou’s lemma analogously to the proof of Lemma 4.20 in [Pir21], we can find a finite memory scheduler with expected long-run frequency, and hence long-run probability, greater than $\vartheta$ . ∎

4.4 Conditional value-at-risk for accumulated weights

Lastly, we aim to prove the Positivity-hardness of the threshold problem for the CVaR in this section. To this end, we provide a further direct reduction from the Positivity-problem to the threshold problem for the expected value of an auxiliary random variable closely related to the CVaR using our MDP-gadgets.

Conditional Value-at-Risk.

Given an MDP $\mathcal{M}=(S,\mathit{Act},P,s_{\mathit{\scriptscriptstyle init}},\mathit{wgt},\mathit{Goal})$ with a scheduler $\mathfrak{S}$ , a random variable $X$ defined on runs of the MDP with values in $\mathbb{R}$ and a value $p\in[0,1]$ , we define the value-at-risk as $\mathit{VaR}^{\mathfrak{S}}_{p}(X)=\sup\{r\in\mathbb{R}|\mathrm{Pr}_{\mathcal{M}}^{\mathfrak{S}}(X\leq r)\leq p\}$ . So, the value-at-risk is the point at which the cumulative distribution function of $X$ reaches or exceeds $p$ . The conditional value-at-risk is now the expectation of $X$ under the condition that the outcome belongs to the $p$ worst outcomes – in this case, the $p$ lowest outcomes. Denote $\mathit{VaR}_{p}^{\mathfrak{S}}(X)$ by $v$ . Following the treatment of random variables that are not continuous in general in [KM18], we define the conditional value-at-risk as follows:

[TABLE]

Outcomes of $X$ which are less than $v$ are treated differently to outcomes equal to $v$ as it is possible that the outcome $v$ has positive probability and we only want to account exactly for the $p$ worst outcomes. Hence, we take only $p-\mathrm{Pr}_{\mathcal{M}}^{\mathfrak{S}}(X<v)$ of the outcomes which are exactly $v$ into account as well. To provide worst-case guarantees or to find risk-averse policies, we are interested in the maximal and minimal conditional value-at-risk

[TABLE]

In our formulation here, low outcomes are considered to be bad. Completely analogously, one can define the conditional value-at-risk for the highest $p$ outcomes.

The main result of the section is the following:

Theorem 4.24.

The Positivity problem is polynomial-time reducible to the following problem: Given an MDP $\mathcal{M}$ and rationals $\vartheta$ and $p\in(0,1)$ , decide whether

[TABLE]

We will use an auxiliary optimization problem to prove this result. We begin with the following consideration: Given an MDP $\mathcal{M}$ with initial state $s_{\mathit{\scriptscriptstyle init}}$ , we construct a new MDP $\mathcal{N}$ . We add a new initial state $s_{\mathit{\scriptscriptstyle init}}^{\prime}$ . In $s_{\mathit{\scriptscriptstyle init}}^{\prime}$ , there is only one action with weight [math] enabled leading to $s_{\mathit{\scriptscriptstyle init}}$ with probability $\frac{1}{3}$ and to $\mathit{goal}$ with probability $\frac{2}{3}$ . So, at least two thirds of the paths accumulate weight [math] before reaching the goal. Hence, we can already say that $\mathit{VaR}^{\mathfrak{S}}_{1/2}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal})=0$ in $\mathcal{N}$ under any scheduler $\mathfrak{S}$ . Note that schedulers for $\mathcal{M}$ can be seen as schedulers for $\mathcal{N}$ and vice versa. This considerably simplifies the computation of the conditional value-at-risk in $\mathcal{N}$ . Define the random variable $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal}$ on paths $\zeta$ by

[TABLE]

Now, the conditional value-at-risk for the probability value $1/2$ under a scheduler $\mathfrak{S}$ in $\mathcal{N}$ is given by $\mathit{CVaR}^{\mathfrak{S}}_{1/2}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal})=2\cdot\mathbb{E}^{\mathfrak{S}}_{\mathcal{N},s_{\mathit{\scriptscriptstyle init}}}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal})=\frac{2}{3}\cdot\mathbb{E}^{\mathfrak{S}}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal})$ . So, the result follows from the following lemma:

Lemma 4.25.

The Positivity problem is polynomial-time reducible to the following problem: Given an MDP $\mathcal{M}$ and a rational $\vartheta$ , decide whether $\mathbb{E}^{\max}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal})>\vartheta$ .

Proof.

The first important observation is that the optimal expectation $e(q,w)$ of $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal}$ for different starting states $q$ and starting weights $w$ satisfies equation ( $\ast$ ) from Section 3.2, i.e., $e(q,w)=\sum_{r\in S}P(q,\alpha,r)\cdot e(r,w{+}\mathit{wgt}(q,\alpha))$ if an optimal scheduler chooses actions $\alpha$ in state $q\not=\mathit{goal}$ when the accumulated weight is $w$ . The value $e(\mathit{goal},w)$ is $w$ if $w\leq 0$ and [math] otherwise. This allows us to reuse the gadget $\mathcal{G}_{\bar{\alpha}}$ to encode a linear recurrence relation.

We again adjust the gadget encoding the initial values of a linear recurrence sequence. So, let $k$ be a natural number, $\alpha_{1},\dots,\alpha_{k}$ be rational coefficients of a linear recurrence sequence, and $\beta_{0},\dots,\beta_{k-1}\geq 0$ the rational initial values. W.l.o.g. we again assume these values to be small using Assumption 3.1, namely: $\sum_{1\leq i\leq k}|\alpha_{i}|\leq\frac{1}{5(k+1)}$ and for all $j$ , $\beta_{j}\leq\frac{1}{3}\alpha$ where $\alpha=\sum_{1\leq i\leq k}|\alpha_{i}|$ .

The new gadget that encodes the initial values of a linear recurrence sequence is depicted in Figure 16. In states $t$ and $s$ , there is a choice between actions $\gamma_{j}$ and $\delta_{j}$ , respectively, for $0\leq j\leq k-1$ . After glueing together this gadget with the gadget $\mathcal{G}_{\bar{\alpha}}$ at states $t$ , $s$ , and $\mathit{goal}$ , we prove that the interplay between the gadgets is correct: Let $0\leq j\leq k-1$ . Starting with accumulated weight ${-}k{+}j$ in state $t$ , the action $\gamma_{j}$ maximizes the partial expectation among the actions $\gamma_{0},\dots,\gamma_{k-1}$ . Likewise, $\delta_{j}$ is optimal when starting in $s$ with weight ${-}k{+}j$ . If the accumulated weight is non-negative in state $s$ or $t$ , then $\gamma$ or $\delta$ are optimal. The idea is that for positive starting weights, the tail loss of $\gamma_{i}$ and $\delta_{i}$ is relatively high while for weights just below [math], the chance to reach $\mathit{goal}$ with positive weight again outweighs this tail loss.

First, we estimate the expectation of $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal}$ when choosing $\delta_{i}$ and $\delta$ while the accumulated weight is ${-}k{+}j$ in $s$ . If $i>j$ , then $\delta_{i}$ and $\delta$ lead to $\mathit{goal}$ directly with probability $1{-}\alpha$ and weight $\leq-1$ . So, the expectation is less than ${-}(1-\alpha)\leq{-}1{+}\frac{1}{5(k{+}1)}$ .

If $i\leq j$ , then with probability $1{-}\alpha$ $\mathit{goal}$ is reached with positive weight, hence $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal}$ is [math] on these paths. With probability $\beta_{i}$ , goal is reached via $y_{j}^{\prime}$ . In this case all runs reach $\mathit{goal}$ with negative weight. On the way to $y_{j}^{\prime}$ weight $2k$ is added, but afterwards subtracted again at least once. In expectation weight $2k$ is subtracted $\frac{k{+}1}{k}$ many times. Furthermore, ${-}2k{+}i$ is added to the starting weight of ${-}k{+}j$ . So, these paths contribute $\beta_{i}\cdot(2k-2k\frac{k{+}1}{k}{-}3k{+}j{+}i)=({-}3k{+}j{+}i{-}2)\cdot\beta_{i}$ to the expectation of $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal}$ . With analogous reasoning, we see that the remaining paths contribute $({-}3k{+}j{+}i{-}1)\cdot(\alpha-\beta_{i})$ . So, all in all the expectation of $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal}$ in this situation is $\alpha{\cdot}({-}3k{+}j{+}i{-}1){-}\beta_{i}$ . Now, as $\alpha\leq\frac{1}{5(k{+}1)}$ and $\beta_{i}\leq\frac{\alpha}{3}$ for all $i$ , we see that $\alpha{\cdot}({-}3k{+}j{+}i{-}1){-}\beta_{i}\geq{-}(3k+2)\alpha\geq{-}1{+}\frac{1}{5(k{+}1)}$ . The optimum with $i\leq j$ is obtained for $i=j$ as $\beta_{i}\leq\alpha/3$ for all $i$ . Hence indeed $\delta_{j}$ is the optimal action. For $\gamma_{j}$ the same proof with $\beta_{i}=0$ for all $i$ leads to the same result.

Now assume that the accumulated weight in $t$ or $s$ is $\ell\geq 0$ . Then, all actions lead to $\mathit{goal}$ with a positive weight with probability $1-\alpha$ . In this case $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal}$ is [math]. However, a scheduler $\mathfrak{S}$ which always chooses $\gamma$ and $\delta$ is better than a scheduler choosing $\gamma_{j}$ or $\delta_{j}$ for any $j\leq k{-}1$ . Under scheduler $\mathfrak{S}$ starting from $s$ or $t$ a run returns to $\{s,t\}$ with probability $\alpha$ while accumulating weight $\geq{-}k$ and the process is repeated. After choosing $\gamma_{j}$ or $\delta_{j}$ the run moves to $x_{j}$ , $y_{j}$ or $y_{j}^{\prime}$ while accumulating a negative weight. From then on, in each step it will stay in that state with probability greater than $\alpha$ and accumulate weight $\leq{-}k$ . Hence, the expectation of $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal}$ is lower under $\gamma_{j}$ or $\delta_{j}$ than under $\mathfrak{S}$ . Therefore indeed $\gamma$ and $\delta$ are the best actions for non-negative accumulated weight in states $s$ and $t$ .

Let now $e(t,w)$ and $e(s,w)$ denote the optimal expectations of $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal}$ when starting in $t$ or $s$ with weight $w$ . Further, let $d(w)=e(t,w)-e(s,w)$ . From the argument above, we also learn that the difference $d(-k{+}j)$ is equal to $\beta_{j}$ , for $0\leq j\leq k-1$ . Put together with the linear recurrence encoded in $\mathcal{G}_{\bar{\alpha}}$ this shows that $d({-k}+w)=u_{w}$ for all $w$ where $(u_{n})_{n\in\mathbb{N}}$ is the linear recurrence sequence specified by the $\alpha_{i}$ , $\beta_{j}$ , $1\leq i\leq k$ , and $0\leq j\leq k{-}1$ .

Finally, we add the same initial component as in the previous section to obtain an MDP $\mathcal{M}$ . Let $\mathfrak{S}$ be the scheduler always choosing $\tau$ in state $c$ and afterwards following the optimal actions as described above is optimal iff the linear recurrence sequence stays non-negative. The remaining argument goes completely analogously to the proof of Theorem 4.1. Grouping together the optimal values in vectors $v_{n}$ with $2k$ entries as done there, we can use the same Markov chain as in that proof to obtain a matrix $A$ such that $v_{n+1}=Av_{n}$ . This allows us to compute the rational value $\vartheta=\mathbb{E}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}^{\mathfrak{S}}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal})$ via a matrix series in polynomial time and $\mathbb{E}_{\mathcal{M},s_{\mathit{\scriptscriptstyle init}}}^{\max}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal})>\vartheta$ if and only if the given linear recurrence sequence is eventually negative. ∎

By the discussion above, this lemma directly implies Theorem 4.24. With adaptions similar to the previous section, it is possible to obtain the analogous result for the minimal expectation of $\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal}$ . This implies that also the threshold problem whether the minimal conditional value-at-risk is less than a threshold $\vartheta$ , $\mathit{CVaR}^{\min}_{p}(\leavevmode\hbox to12.69pt{\vbox to14.42pt{\pgfpicture\makeatletter\hbox{\hskip 6.3469pt\lower-7.208pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}}{} {}{{}}{}{}{}{}{{}}{} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.01389pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{3.01385pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-3.87498pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{3.87495pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{3.01385pt}{0.0pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@lineto{0.0pt}{3.87495pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{0.0pt}{3.87495pt}\pgfsys@lineto{0.0pt}{-3.87498pt}\pgfsys@stroke\pgfsys@invoke{ } { {}{}{}}{}{{}}{}{ {}{}{}} {}{}{}\pgfsys@moveto{3.01385pt}{0.0pt}\pgfsys@lineto{-3.01389pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\mathit{goal})<\vartheta$ , is Positivity-hard.

5 Conclusion

The Positivity-hardness results established in this paper show that a series of problems on finite-state MDPs that have been studied and left open in the literature exhibit an inherent mathematical difficulty. A decidability result for any of these problems would imply a major break-through in analytic number theory. At the heart of our Positivity-hardness proofs lies the construction of modular MDPs consisting of three gadgets. This construction provides a versatile proof strategy to establish Positivity-hardness results: It allowed us to provide three direct reductions from the Positivity problem by constructing structurally identical MDPs that only differ in the gadget encoding the initial values. The further chains of reductions depicted in Figure 1 established Positivity-hardness for a landscape of different problems. These problems range from problems on one-counter MDPs and integer-weighted MDPs also to problems concerning the long-run satisfaction of path properties, namely the threshold problem for long-run probabilities and the model-checking problem of frequency-LTL, which are on first sight of a rather different nature compared to the other problems.

The proof technique might be applicable to further threshold problems associated to optimization problems on MDPs. A main requirement for the direct applicability of the technique is that the optimal values $V(s,w)$ in terms of the current state $s$ and the weight $w$ accumulated so far, or a similar quantity that can be increased and decreased, satisfy an optimality equation of the form

[TABLE]

In addition, the optimum must not be achievable with memoryless schedulers, but the optimal decisions have to depend on the accumulated weight to make it possible to encode initial values of a linear recurrence sequence. This combination of conditions is quite common as we have seen. Furthermore, our and possible future Positivity-hardness results might be transferrable to further notions resulting from taking long-run averages (as in the case of long-run probabilities and frequency-LTL) or conditioning (as in the case of conditional expectations and conditional values-at-risk).

In the special case of Markov chains, several of the investigated problems are decidable: In Markov chains, partial and conditional expectations and long-run probabilities and frequencies can easily be computed. Furthermore, one-counter Markov chains constitute a special case of recursive Markov chains, for which the threshold problem for the termination probability can be decided in polynomial space [EY09]. Remarkably however, the threshold problem for the probability that the accumulated cost satisfies a Boolean combination of inequality constraints in finite-state Markov chains is open [HKL17].

Finally, the Positivity-hardness results of course leave the possibility open that some or all of the problems we studied are in fact harder than the Positivity problem. In particular, it could be the case that the problems are undecidable and that a proof of the undecidability would yield no implications for the Positivity problem. For this reason, investigating whether some or all of the threshold problems are reducible to the Positivity problem constitutes a very interesting – and challenging – direction for future work. Such an inter-reducibility result would show that studying any of the discussed optimization problems on MDPs could be a worthwhile direction of research to settle the decidability status of the Positivity-problem. Some hope for an inter-reducibility result can be drawn from the fact that the optimal values are approximable for several of the problems – for termination probabilities and expected termination times of one-counter MDPs, this was shown in [BBEK11, BKNW12] and for partial and conditional expectations in [PB19]. This indicates that there is at least a major difference to undecidable problems in a similar context such as the emptiness problem for probabilistic finite automata where the optimal value cannot be approximated [Paz71, CL89].

Bibliography70

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AAGT 15] Manindra Agrawal, Sundararaman Akshay, Blaise Genest, and P. S. Thiagarajan. Approximate verification of the symbolic dynamics of Markov chains. Journal of the ACM , 62(1):1–34, 2015.
2[AAOW 15] S. Akshay, Timos Antonopoulos, Joël Ouaknine, and James Worrell. Reachability problems for Markov chains. Information Processing Letters , 115(2):155–158, 2015.
3[ADBA 21] Mohamadreza Ahmadi, Anushri Dixit, Joel W Burdick, and Aaron D Ames. Risk-averse stochastic shortest path planning. ar Xiv:2103.14727 , 2021.
4[AT 02] Carlo Acerbi and Dirk Tasche. Expected shortfall: A natural coherent alternative to value at risk. Economic Notes , 31(2):379–388, 2002.
5[BAGM 12] Amir M. Ben-Amram, Samir Genaim, and Abu Naser Masud. On the termination of integer loops. ACM Transactions on Programming Languages and Systems , 34(4):1–24, 2012.
6[BBD + 18] Christel Baier, Nathalie Bertrand, Clemens Dubslaff, Daniel Gburek, and Ocan Sankur. Stochastic shortest paths and weight-bounded properties in Markov decision processes. In 33rd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS) , pages 86–94. ACM, 2018.
7[BBE + 10] Tomás Brázdil, Václav Brožek, Kousha Etessami, Antonín Kučera, and Dominik Wojtczak. One-counter Markov decision processes. In 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 863–874. SIAM, 2010.
8[BBEK 11] Tomáš Brázdil, Václav Brožek, Kousha Etessami, and Antonín Kučera. Approximating the termination value of one-counter MD Ps and stochastic games. In Luca Aceto, Monika Henzinger, and Jiří Sgall, editors, 38th International Colloquium on Automata, Languages, and Programming (ICALP) , volume 6755 of Theoretical Computer Science and General Issues , pages 332–343. Springer, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Positivity-hardness results on Markov decision processes

Abstract

1 Introduction

1.1 Positivity problem

Definition 1.1** (Positivity problem).**

1.2 Problems under investigation and related work on these problems

Energy objectives, one-counter MDPs, and quantiles.

Non-classical stochastic shortest path problems (SSPPs).

Long-run properties and frequency-LTL over MDPs.

1.3 Contribution

Main result**.**

1.4 Related work on Skolem- and Positivity-hardness in verification

1.5 Outline

1.6 Note on the publication status of the results

2 Preliminaries

Markov decision process.

Scheduler.

Probability measure.

Classical stochastic shortest path problem.

3 Outline of the Positivity-hardness proofs

3.1 Structure of the MDP constructed for the direct reductions from the Positivity problem

3.2 MDP-gadget for linear recurrence relations

Optimality equations.

Scaling down coefficients of a linear recurrence sequence.

Assumption 3.1**.**

MDP-gadget for linear recurrence relations.

4 Reductions from the Positivity problem

4.1 One-counter MDPs, energy objectives, cost problems, and quantiles

Theorem 4.1**.**

Corollary 4.2**.**

Proof of Theorem 4.1(1): construction of the MDP.

Proof of Theorem 4.1(1): correctness of the encoding of the linear recurrence sequence.

Lemma 4.3**.**

Proof.

Proof of Theorem 4.1(1): computation of the threshold ϑ\varthetaϑ.

Lemma 4.4**.**

Proof.

Proof of Theorem 4.1(2).

Corollary 4.5**.**

Remark 4.6**.**

Corollary 4.7**.**

Corollary 4.8**.**

Corollary 4.9**.**

4.2 Partial and conditional stochastic shortest path problems

Partial SSPP.

Theorem 4.10**.**

Proof of Theorem 4.10: construction of the MDP and correctness of the encoding of a linear recurrence sequence.

Lemma 4.11**.**

Proof.

Lemma 4.12**.**

Proof of Theorem 4.10: computation of the threshold ϑ\varthetaϑ.

Lemma 4.13**.**

Proof.

Lemma 4.14**.**

Proof.

Theorem 4.15**.**

Two-sided partial SSPP

Theorem 4.16**.**

Proof.

4.3 Long-run probabilities and frequency-LTL

Long-run probability.

Example 4.17**.**

Preliminary results.

Lemma 4.18** ([Pir21]).**

Lemma 4.19** ([Pir21]).**

Lemma 4.20** ([Pir21]).**

Theorem 4.21** ([Pir21]).**

Positivity-hardness.

Theorem 4.22**.**

Frequency-LTL.

Theorem 4.23**.**

Proof.

4.4 Conditional value-at-risk for accumulated weights

Definition 1.1 (Positivity problem).

Main result.

Assumption 3.1.

Theorem 4.1.

Corollary 4.2.

Lemma 4.3.

Proof of Theorem 4.1(1): computation of the threshold $\vartheta$ .

Lemma 4.4.

Corollary 4.5.

Remark 4.6.

Corollary 4.7.

Corollary 4.8.

Corollary 4.9.

Theorem 4.10.

Lemma 4.11.

Lemma 4.12.

Proof of Theorem 4.10: computation of the threshold $\vartheta$ .

Lemma 4.13.

Lemma 4.14.

Theorem 4.15.

Theorem 4.16.

Example 4.17.

Lemma 4.18 ([Pir21]).

Lemma 4.19 ([Pir21]).

Lemma 4.20 ([Pir21]).

Theorem 4.21 ([Pir21]).

Theorem 4.22.

Theorem 4.23.

Theorem 4.24.

Lemma 4.25.