On Strong Determinacy of Countable Stochastic Games
Stefan Kiefer, Richard Mayr, Mahsa Shirmohammadi, Dominik Wojtczak

TL;DR
This paper investigates the strong determinacy of countably infinite stochastic games, proving almost-sure objectives are strongly determined and identifying limitations for other objectives, with implications for strategy complexity.
Contribution
It establishes strong determinacy for almost-sure objectives in countably infinite games and shows that some other objectives lack this property, also analyzing strategy types needed.
Findings
Almost-sure objectives are strongly determined.
/2-Bbuchi objectives are not strongly determined.
Memoryless deterministic strategies suffice for certain objectives.
Abstract
We study 2-player turn-based perfect-information stochastic games with countably infinite state space. The players aim at maximizing/minimizing the probability of a given event (i.e., measurable set of infinite plays), such as reachability, B\"uchi, omega-regular or more general objectives. These games are known to be weakly determined, i.e., they have value. However, strong determinacy of threshold objectives (given by an event and a threshold ) was open in many cases: is it always the case that the maximizer or the minimizer has a winning strategy, i.e., one that enforces, against all strategies of the other player, that the objective is satisfied with probability (resp. )? We show that almost-sure objectives (where ) are strongly determined. This vastly generalizes a previous result on finite games with almost-sure tail objectives. On the other hand…
| Objective | ||||
|---|---|---|---|---|
| Reachability | ✓(MD) | ✓(MD) | ✓(FR) | ✓(MD) |
| Büchi | ✔(FR) | ✖ | ✖ | ✔(MD) |
| Borel | ✔(FR) | ✖ | ✖ | ✔(FR) |
| Objective | ||||
|---|---|---|---|---|
| Reachability | ✓(MD) | ✓(MD) | ✓(FR) | ✓(MD) |
| Büchi | ✔(FR) | ✖ | ✖ | ✔(MD) |
| Borel | ✔(FR) | ✖ | ✖ | ✔(FR) |
| Objective | ||||
|---|---|---|---|---|
| Reachability | ✓(MD) | ✔(FR) | ||
| Büchi | ✔(FR) | ✔(FR) | ||
| Borel | ✔(FR) | ✔(FR) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On Strong Determinacy of
Countable Stochastic Games
Stefan Kiefer1, Richard Mayr2, Mahsa Shirmohammadi1, Dominik Wojtczak3
1University of Oxford, UK
2University of Edinburgh, UK
3University of Liverpool, UK
Abstract
We study 2-player turn-based perfect-information stochastic games with countably infinite state space. The players aim at maximizing/minimizing the probability of a given event (i.e., measurable set of infinite plays), such as reachability, Büchi, -regular or more general objectives.
These games are known to be weakly determined, i.e., they have value. However, strong determinacy of threshold objectives (given by an event and a threshold ) was open in many cases: is it always the case that the maximizer or the minimizer has a winning strategy, i.e., one that enforces, against all strategies of the other player, that is satisfied with probability (resp. )?
We show that almost-sure objectives (where ) are strongly determined. This vastly generalizes a previous result on finite games with almost-sure tail objectives. On the other hand we show that (co-)Büchi objectives are not strongly determined, not even if the game is finitely branching.
Moreover, for almost-sure reachability and almost-sure Büchi objectives in finitely branching games, we strengthen strong determinacy by showing that one of the players must have a memoryless deterministic (MD) winning strategy.
Index Terms:
stochastic games, strong determinacy, infinite state space
††publicationid: pubid: Extended version of material presented at LICS 2017. arXiv.org - CC BY 4.0.
I Introduction
Stochastic games. Two-player stochastic games [16] are adversarial games between two players (the maximizer and the minimizer ) where some decisions are determined randomly according to a pre-defined distribution. Stochastic games are also called -player games in the terminology of [8, 7]. Player tries to maximize the expected value of some payoff function defined on the set of plays, while player tries to minimize it. In concurrent stochastic games, in every round both players each choose an action (out of given action sets) and for each combination of actions the result is given by a pre-defined distribution. In the subclass of turn-based stochastic games (also called simple stochastic games) only one player gets to choose an action in every round, depending on which player owns the current state.
We study 2-player turn-based perfect-information stochastic games with countably infinite state spaces. We consider objectives defined via predicates on plays, not general payoff functions. Thus the expected payoff value corresponds to the probability that a play satisfies the predicate.
Standard questions are whether a game is determined, and whether the strategies of the players can without restriction be chosen to be of a particular type, e.g., MD (memoryless deterministic) or FR (finite-memory randomized).
Finite-state games vs. Infinite-state games. Stochastic games with finite state spaces have been extensively studied [23, 9, 11, 17, 8], both w.r.t. their determinacy and the strategy complexity (memory requirements and randomization). E.g., strategies in finite stochastic parity games can be chosen memoryless deterministic (MD) [10, 7, 6]. These results have a strong influence on algorithms for deciding the winner of stochastic games, because such algorithms often use a structural property that the strategies can be chosen of a particular type (e.g., MD or finite-memory).
More recently, several classes of finitely presented infinite-state games have been considered as well. These are often induced by various types of automata that use infinite memory (e.g., unbounded pushdown stacks, unbounded counters, or unbounded fifo-queues). Most of these classes are still finitely branching. Stochastic games on infinite-state probabilistic recursive systems (i.e., probabilistic pushdown automata with unbounded stacks) were studied in [13, 14, 12], and stochastic games on systems with unbounded fifo-queues were studied in [1]. However, most these works used techniques that are specially adapted to the underlying automata model, not a general analysis of infinite-state games. Some results on general stochastic games with countably infinite state spaces were presented in [19, 4, 18, 5] though many questions remained open (see our contributions further below).
It should be noted that many standard results and proof techniques from finite games do not carry over to countably infinite games. E.g.,
- •
Even if a state has value, an optimal strategy need not exist, not even for reachability objectives [19].
- •
Some strong determinacy properties (see below) do not hold, not even for reachability objectives [4, 18] (while in finite games they hold even for parity objectives [8]).
- •
The memory requirements of optimal strategies are different. In finite games, optimal strategies for parity objectives can be chosen memoryless deterministic [8]. In contrast, in countably infinite games (even if finitely branching) optimal strategies for reachability objectives, where they exist, require infinite memory [19].
One of the reasons underlying this difference is the following. Consider the values of the states in a game w.r.t. a certain objective. If the game is finite then there are only finitely many such values, and in particular there exists some minimal nonzero value (unless all states have value zero). This property does not carry over to infinite games. Here the set of states is infinite and the infimum over the nonzero values can be zero. As a consequence, even for a reachability objective, it is possible that all states have value , but still the value of some states is . Such phenomena appear already in infinite-state Markov chains like the classic Gambler’s ruin problem with unfair coin tosses in the player’s favor (e.g., win and lose). The value, i.e., the probability of ruin, is always , but still in every state except the ruin state itself; cf. [15] (Chapt. 14).
Weak determinacy. Using Martin’s result [21], Maitra & Sudderth [20] showed that stochastic games with Borel payoffs are weakly determined, i.e., all states have value. This very general result holds even for concurrent games and general (not necessarily countable) state spaces. They work in the framework of finitely additive probability theory (under weak assumptions on measures) and only assume a finitely additive law of motion. Also their payoff functions are general bounded Borel measurable functions, not necessarily predicates on plays.
Strong determinacy. Given a predicate on plays and a constant , strong determinacy of a threshold objective (where ) holds iff either the maximizer or the minimizer has a winning strategy, i.e., a strategy that enforces (against any strategy of the other player) that the predicate holds with probability (resp. ). In the case of , one speaks of an almost-sure objective. If the winning strategy of the winning player can be chosen MD (memoryless deterministic) then one says that the threshold objective is strongly MD determined. Similarly for other types of strategies, e.g., FR (finite-memory randomized).
Strong determinacy in finite games. Strong determinacy for almost-sure objectives (and for the dual positive probability objectives ) is sometimes called qualitative determinacy [17]. In [17, Theorem 3.3] it is shown that finite stochastic games with Borel tail (i.e., prefix-independent) objectives are qualitatively determined. (We’ll show a more general result for countably infinite games and general objectives; see below.) In the special case of parity objectives, even strong MD determinacy holds for any threshold [8].
Strong determinacy in infinite games. It was shown in [4, 18, 5] that in finitely branching games with countable state spaces reachability objectives with any threshold with , are strongly determined. However, the player strategy may need infinite memory [19], and thus reachability objectives are not strongly MD determined. Strong determinacy does not hold for infinitely branching reachability games with thresholds with ; cf. Figure 1 in [4].
Our contribution to determinacy. We show that almost-sure Borel objectives are strongly determined for games with countably infinite state spaces. (In particular this even holds for infinitely branching games; cf. Table I.) This removes both the restriction to finite games and the restriction to tail objectives of [17, Theorem 3.3], and solves an open problem stated there. (To the best of our knowledge, strong determinacy was open even for almost-sure reachability objectives in infinitely branching countable games.)
On the other hand, we show that, for countable games, (co-)Büchi objectives are not strongly determined for any , not even if the game graph is finitely branching.
Our contribution to strategy complexity. While reachability objectives in finitely branching countable games are not strongly MD determined in general [19], we show that strong MD determinacy holds for many interesting subclasses. In finitely branching games, it holds for strict inequality reachability, almost-sure reachability, and in all games where either player does not have any value-decreasing transitions or player does not have any value-increasing transitions.
Moreover, we show that almost-sure Büchi objectives (but not almost-sure co-Büchi objectives) are strongly MD determined, provided that the game is finitely branching.
Table I summarizes all properties of strong determinacy and memory requirements for Borel objectives and subclasses on countably infinite games.
II Preliminaries
A probability distribution over a countable (not necessarily finite) set is a function s.t. . We use to denote the support of . Let be the set of all probability distributions over .
We consider -player games where players have perfect information and play in turn for infinitely many rounds. Games are defined such that the countable set of states is partitioned into the set of states of player, the set of states of player and random states . The relation is the transition relation. We write if , and we assume that each state has a successor state with . The probability function assigns to each random state a probability distribution over its successor states. The game is called finitely branching if each state has only finitely many successors; otherwise, it is infinitely branching. Let . If , we say that player is passive, and the game is a Markov decision process (MDP). A Markov chain is an MDP where both players are passive.
The stochastic game is played by two players (maximizer) and (minimizer). The game starts in a given initial state and evolves for infinitely many rounds. In each round, if the game is in state then player chooses a successor state with ; otherwise the game is in a random state and proceeds randomly to with probability .
Strategies. A play is an infinite sequence of states such that for all ; let denote the -th state along . A partial play is a finite prefix of a play. We say that (partial) play visits if for some , and that starts in if . A strategy of the player is a function that assigns to partial plays a distribution over the successors . Strategies for the player are defined analogously. The set of all strategies of player and player in is denoted by and , respectively (we omit the subscript and write and if is clear). A (partial) play is induced by strategies if for all , and if for all .
To emphasize the amount of memory required to implement a strategy, we present an equivalent formulation of strategies. A strategy of player can be implemented by a probabilistic transducer where is a countable set (the memory of the strategy), is the initial memory mode and is the input and output alphabet. The probabilistic transition function updates the memory mode of the transducer. The probabilistic successor function outputs the next successor, where implies . We extend to and to , in the natural way. Moreover, we extend to paths by and . The strategy induced by the transducer is given by .
Strategies are in general history dependent (H) and randomized (R). An H-strategy is finite memory (F) if there exists some transducer with memory such that and ; otherwise requires infinite memory. An F-strategy is memoryless (M) (also called positional) if . For convenience, we may view M-strategies as functions . An R-strategy is deterministic (D) if and map to Dirac distributions; it implies that is a Dirac distribution for all partial plays . All combinations of the properties in are possible, e.g., MD stands for memoryless deterministic. HR strategies are the most general type.
Probability Measure and Events. To a game , an initial state and strategies we associate the standard probability space w.r.t. the induced Markov chain. First one defines a topological space on the set of infinite plays . The cylinder sets are the sets , where and the open sets are arbitrary unions of cylinder sets, i.e., the sets with . The Borel -algebra is the smallest -algebra that contains all the open sets.
The probability measure is obtained by first defining it on the cylinder sets and then extending it to all sets in the Borel -algebra. If is not a partial play induced by then let ; otherwise let , where is such that for all , for all , and for all . By Carathéodory’s extension theorem [2], this defines a unique probability measure on the Borel -algebra .
We will call any set an event, i.e., an event is a measurable (in the probability space above) set of infinite plays. Equivalently, one may view an event as a Borel measurable payoff function of the form . Given (where potentially ) we often write for to avoid clutter.
Objectives. Let be a game. The objectives of the players are determined by events . We write for the dual objective defined as .
Given a target set , the reachability objective is defined by the event
[TABLE]
Moreover, denotes the set of all plays visiting in the first steps, i.e., . The safety objective is defined as the dual of reachability: .
For a set of states called Büchi states, the Büchi objective is the event
[TABLE]
The co-Büchi objective is defined as the dual of Büchi.
Note that the objectives of player (maximizer) and player (minimizer) are dual to each other. Where player tries to maximize the probability of some objective , player tries to maximize the probability of .
III Determinacy
III-A Optimal and -Optimal Strategies; Weak and Strong Determinacy
Given an objective for player in a game , state has value if
[TABLE]
If has value then denotes the value of defined by the above equality. A game with a fixed objective is called weakly determined iff every state has value.
Theorem 1** (follows immediately from [20]).**
Countable stochastic games (as defined in Section II) are weakly determined.
Theorem 1 is an immediate consequence of a far more general result by Maitra & Sudderth [20] on weak determinacy of (finitely additive) games with general Borel payoff objectives.
For and , we say that
- •
is -optimal (maximizing) iff for all .
- •
is -optimal (minimizing) iff for all .
A [math]-optimal strategy is called optimal. An optimal strategy for the player is almost-surely winning if . Unlike in finite-state games, optimal strategies need not exist in countable games, not even for reachability objectives in finitely branching MDPs [3, 4].
However, since our games are weakly determined by Theorem 1, for all there exist -optimal strategies for both players.
For an objective and and threshold , we define threshold objectives as follows.
- •
{\big{[}\mathcal{E}\big{]}_{\Box}^{{\rhd c}}}_{\!\!{\mathcal{G}}} is the set of states for which there exists a strategy such that, for all , we have .
- •
{\big{[}\mathcal{E}\big{]}_{\Diamond}^{{{\not\!\rhd}c}}}_{\mathcal{G}} is the set of states for which there exists a strategy such that, for all , we have .
We omit the subscript where it is clear from the context. We call a state almost-surely winning for the player iff s\in\big{[}\mathcal{E}\big{]}_{\Box}^{{\geq 1}}.
By the duality of the players, a objective for player corresponds to a objective from player ’s point of view. E.g., an almost-sure Büchi objective for player corresponds to a positive-probability co-Büchi objective for player . Thus we can restrict our attention to reachability, Büchi and general (Borel set) objectives, since safety is dual to reachability, and co-Büchi is dual to Büchi, and Borel is self-dual.
A game with threshold objective is called strongly determined iff in every state either player or player has a winning strategy, i.e., iff S=\big{[}\mathcal{E}\big{]}_{\Box}^{{\rhd c}}\uplus\big{[}\mathcal{E}\big{]}_{\Diamond}^{{{\not\!\rhd}c}}.
Strong determinacy depends on the specified threshold . Strong determinacy for almost-sure objectives (and for the dual positive probability objectives ) is sometimes called qualitative determinacy [17]. In [17, Theorem 3.3] it is shown that finite stochastic games with tail objectives are qualitatively determined. An objective is called tail if for all and all we have , i.e., a tail objective is independent of finite prefixes. The authors of [17] express “hope that [their qualitative determinacy theorem] may be extended beyond the class of finite simple stochastic tail games”. We fulfill this hope by generalizing their theorem from finite to countable games and from tail objectives to arbitrary objectives:
Theorem 2**.**
Stochastic games, even infinitely branching ones, with almost-sure objectives are strongly determined.
Theorem 2 does not carry over to thresholds other than 0 or 1; cf. Theorem 3.
The main ingredients of the proof of Theorem 2 are transfinite induction, weak determinacy of stochastic games (Theorem 1), the concept of a “reset” strategy from [17], and Lévy’s zero-one law. The principal idea of the proof is to construct a transfinite sequence of subgames, by removing parts of the game that player cannot risk entering. This approach is used later in this paper as well, for Theorems 5 and 11.
Example 1**.**
We explain this approach using the reachability game in Figure 1 as an example.
Each state has value in this game, except those labeled with [math]. However, only the states labeled with are almost-surely winning for player . To see this, consider a player state labeled with . In order to reach , player eventually needs to take a transition to a [math]-labeled state, which is not almost-surely winning. This means that the -labeled states are not almost-surely winning either. Hence, player cannot risk entering them if the player wants to win almost surely. Continuing this style of reasoning, we infer that the -labeled states are not almost-surely winning, and so on. This implies that the -labeled states are not almost-surely winning, and so on. The only almost-surely winning player state is the -labeled state at the bottom of the figure, and the only winning strategy is to take the direct transition to the target in the bottom-left corner.
Proof of Theorem 2.
The first step of the proof is to transform the game and the objective so that the objective can in some respects be treated like a tail objective. Let be a stochastic game with countable state space and objective . We convert the game graph to a forest by encoding the history in the states. Formally we proceed as follows. The state space, , of the new game, , consists of the partial plays in , i.e., . Observe that is countable. For any we define . A transition is a transition of iff it is of the form where and is a transition in . The probabilities in are defined in the obvious way. For we define an objective so that a play in starting from the singleton satisfies iff the corresponding play from in satisfies . Since strategies in (for singleton initial states in ) carry over to strategies in , it suffices to prove our determinacy result for .
Let us inductively extend the definition of from to arbitrary . For any transition in , define . This is well-defined as the transition graph of is a forest. For any , the event is also measurable. By this construction we obtain the following property: If a play in visits states then the suffix of starting from satisfies iff the suffix of starting from satisfies . This property is weaker than the tail property (which would stipulate that all are equivalent), but it suffices for our purposes.
In the remainder of the proof, when is (a subgame of) , we write for to avoid clutter. Similarly, when we write we mean the value with respect to .
In order to characterize the winning sets of the players, we construct a transfinite sequence of subgames of , where is an ordinal number, by stepwise removing certain states that are losing for player , along with their incoming transitions. Thus some subgames may contain states without any outgoing transitions (i.e., dead ends). Such dead ends are always considered as losing for player . (Formally, one might add a self-loop to such states and remove from the objective all plays that reach these states.)
Let denote the state space of the subgame . We start with . Given , denote by the set of states with . For any we define .
Since the sequence of sets is non-increasing and is countable, it follows that this sequence of games converges (i.e., is ultimately constant) at some ordinal where (the first uncountable ordinal). That is, we have . Note in particular that does not contain any dead ends. (However, its state space might be empty. In this case it is considered to be losing for player .)
We define the index, , of a state as the smallest ordinal with , and as if such an ordinal does not exist. For all states we have:
[TABLE]
We show that states with are in {\big{[}\mathcal{E}\big{]}_{\Diamond}^{{<1}}}_{\!\!{\mathcal{G}}}, and states with are in {\big{[}\mathcal{E}\big{]}_{\Box}^{{=1}}}_{\!\!{\mathcal{G}}}.
Strategy : For each with we construct a player strategy such that holds for all player strategies . The strategy is defined inductively over the index .
Let with . In game we have . So by weak determinacy (Theorem 1) there is a strategy with for all . (For example, one may take a -optimal player strategy). We extend to a strategy in as follows. Whenever the play enters a state (hence ) then switches to the previously defined strategy . (One could show that only player can take a transition leaving , although this is not needed at the moment.)
We show by transfinite induction on the index that holds for all player strategies and for all states with .
For the induction hypothesis, let be an ordinal for which this holds for all states with . For the inductive step, let be a state with , and let be an arbitrary player strategy in .
Suppose that the play from under the strategies always remains in , i.e., the probability of ever leaving under is zero. Then any play in under these strategies coincides with a play in , so we have , as desired. Now suppose otherwise, i.e., the play from under , with positive probability, enters a state , hence . By the induction hypothesis we have for any . Since the probability of entering is positive, we conclude , as desired.
Strategy : For each with (and thus ) we construct a player strategy such that holds for all player strategies . We first observe that if is a transition in with and then . Indeed, let , thus ; if then and thus ; if then . It follows that only player could ever leave the state space , but our player strategy will ensure that the play remains in forever. Recall that does not contain any dead ends and that for all . For all , by weak determinacy (Theorem 1) we fix a strategy with for all .
Fix an arbitrary state as the initial state. For a player strategy , define mappings using conditional probabilities:
[TABLE]
where denotes the event containing the plays that start with the length- prefix of . Thanks to our “forest” construction at the beginning of the proof, depends, in fact, only on the -th state visited by .
For some illustration, a small value of means that considering the length- prefix of , player has a strategy that makes unlikely at time . Similarly, a large value of means that at time (when the length- prefix has been “uncovered”) the probability of using is large, regardless of the player strategy.
In the following we view as a random variable (taking on a random value depending on a random play).
We define our almost-surely winning player strategy as the limit of inductively defined strategies . Let . Using the definition of we get . For any , define as follows. Strategy plays as long as . This could be forever. Otherwise, let denote the smallest with , and let be the -th state of the play. At that time, switches to strategy , implying . This switch of strategy is referred to as a “reset” in [17], where the concept is used similarly. For any , strategy performs at most such resets. Define as the limit of the , i.e., the number of resets performed by is unbounded.
In order to show that is almost surely winning, we first argue that almost surely performs only a finite number of resets. Suppose and are such that a -th reset happens after visiting the -th state in . As argued above, we have . Towards a contradiction assume that player has a strategy to cause yet another reset with probability , i.e.,
[TABLE]
where denotes the event of another reset after time . If another reset occurs, say at time , then , and then player can switch to a strategy to force . Hence:
[TABLE]
Let denote the player strategy combining and . Then it follows:
[TABLE]
Hence we have:
[TABLE]
contradicting . So at time , the probability of another reset is bounded by . Since this holds for every reset time , we conclude that almost surely there will be only finitely many resets under , regardless of .
Now we can show that holds for all . Fix arbitrarily. For define as the event that exactly resets occur. Let us write to avoid clutter. By Lévy’s zero-one law (see, e.g., [25, Theorem 14.2]), for any , we have -almost surely that either
[TABLE]
or
[TABLE]
holds. Let be a play that satisfies the second option. In particular, , so there exists with for all . It follows that holds for all . But that contradicts the fact that . So plays satisfying the second option do not actually exist.
Hence we conclude , thus . Since the strategies and agree on all finite prefixes of all plays in , the probability measures and agree on all subevents of . It follows . We have shown previously that the number of resets is almost surely finite, i.e., . Hence we have:
[TABLE]
Thus, . Since is defined on , this strategy never leaves . Since only player might have transitions that leave , we conclude . ∎
III-B Reachability and Safety
It was shown in [4] and [18] (and also follows as a corollary from [5]) that finitely branching games with reachability objectives with any threshold with are strongly determined. In contrast, strong determinacy does not hold for infinitely branching reachability games with thresholds with ; cf. Figure 1 in [4]. However, by Theorem 2, strong determinacy does hold for almost-sure reachability and safety objectives in infinitely branching games. By duality, this also holds for reachability and safety objectives with threshold . (For almost-sure safety (resp. reachability), this could also be shown by a reduction to non-stochastic 2-player reachability games [26].)
III-C Büchi and co-Büchi
Let be the Büchi objective (the co-Büchi objective is dual). Again, Theorem 2 applies to almost-sure and positive-probability Büchi and co-Büchi objectives, so those games are strongly determined, even infinitely branching ones.
However, this does not hold for thresholds , not even for finitely branching games:
Theorem 3**.**
Threshold (co-)Büchi objectives with thresholds are not strongly determined, even for finitely branching games.
A fortiori, threshold parity objectives are not strongly determined, not even for finitely branching games. We prove Theorem 3 using the finitely branching game in Figure 2. It is inspired by an infinitely branching example in [4], where it was shown that threshold reachability objectives in infinitely branching games are not strongly determined.
Proof sketch of Theorem 3.
The game in Figure 2 is finitely branching, and we consider the Büchi objective. The infinite choice for player in the example of [4] is simulated with an infinite chain of Büchi states in our example. All states are finitely branching and belong to player . The crucial property is that player can stay in the states for arbitrarily long (thus making the probability of reaching the state arbitrarily small) but not forever. Since the states are Büchi states, plays that stay in them forever satisfy the Büchi objective surely, something that player needs to avoid. So a player strategy must choose a transition for some , resulting in a faithful simulation of infinite branching from to some state , just like in the reachability game in [4].
From the fact that and , we deduce the following properties of this game:
- •
, but there exists no optimal strategy starting in . The value is witnessed by a family of -optimal strategies : traversing the ladder and choosing .
- •
, but there exists no optimal minimizing strategy starting in ; however, in analogy with , there are -optimal strategies.
- •
. We argue below that neither player has an optimal strategy starting in . It follows that i\not\in\big{[}\mathcal{E}\big{]}_{\Box}^{{\geq\frac{1}{2}}}\uplus\big{[}\mathcal{E}\big{]}_{\Diamond}^{{\not>\frac{1}{2}}} for the Büchi condition . So neither player has a winning strategy, neither for nor for . Indeed, consider any player strategy . Following , once the game is in , Büchi states cannot be visited with probability more than for some fixed and all strategies . Player has an -optimal strategy starting in . Then we have:
[TABLE]
so is not optimal. One can argue symmetrically that player does not have an optimal strategy either.
In the example in Figure 2, the game branches from state to and with probability respectively. However, the above argument can be adapted to work for probabilities and for every constant . ∎
IV Memory Requirements
In this section we study how much memory is needed to win objectives , depending on and on the constraint .
We say that an objective is strongly MD-determined iff for every state either
- •
there exists an MD-strategy such that, for all , we have , or
- •
there exists an MD-strategy such that, for all , we have .
If a game is strongly MD-determined then it is also strongly determined, but not vice-versa. Strong FR-determinacy is defined analogously.
IV-A Reachability and Safety Objectives
Let and be a threshold reachability objective. (Safety objectives are dual to reachability.)
Let us briefly discuss infinitely branching reachability games. If then strong determinacy does not hold; cf. Figure 1 in [4]. Objectives are strongly determined (Theorem 2), but not strongly FR-determined, because player needs infinite memory (even if player is passive) [19]. Objectives correspond to non-stochastic 2-player reachability games, which are strongly MD-determined [26].
In the rest of this subsection we consider finitely branching reachability games. It is shown in [4, 18] that finitely branching reachability games are strongly determined, but the winning strategy constructed therein uses infinite memory. Indeed, Kučera [19] showed that infinite memory is necessary in general:
Theorem 4** (follows from Proposition 5.7.b in [19]).**
Finitely branching reachability games with objectives are not strongly FR-determined for .
The example from [19] that proves Theorem 4 has the following properties:
- (1)
player has value-decreasing (see below) transitions; 2. (2)
player has value-increasing (see below) transitions; 3. (3)
threshold and ; 4. (4)
nonstrict inequality: .
Given a game , we call a transition value-decreasing (resp., value-increasing) if (resp., ). If player (resp., player ) controls a transition , i.e., (resp., ), then the transition cannot be value-increasing (resp., value-decreasing). We write for the game obtained from by removing the value-increasing transitions controlled by player . Note that this operation does not create any dead ends in finitely branching games, because at least one transition to a successor state with the same value will always remain for such games.
We show that a reachability game is strongly MD-determined if any of the properties listed above is not satisfied:
Theorem 5**.**
Finitely branching games with reachability objectives are strongly MD-determined, provided that at least one of the following conditions holds.
- (1)
player does not have value-decreasing transitions, or 2. (2)
player does not have value-increasing transitions, or 3. (3)
almost-sure objective: and , or 4. (4)
strict inequality: .
Remark 1**.**
Condition (1) or (2) of Theorem 5 is trivially satisfied if the corresponding player is passive, i.e., in MDPs. It was already known that MD strategies are sufficient for safety and reachability objectives in countable finitely branching MDPs ([22], Section 7.2.7). Theorem 5 generalizes this result.
Remark 2**.**
Theorem 5 does not carry over to stochastic reachability games with an arbitrary number of players, not even if the game graph is finite. Instead multiplayer games can require infinite memory to win. Proposition 4.13 in [24] constructs an 11-player finite-state stochastic reachability game with a pure subgame-perfect Nash equilibrium where the first player wins almost surely by using infinite memory. However, there is no finite-state Nash equilibrium (i.e., an equilibrium where all players are limited to finite memory) where the first player wins with positive probability. That is, the first player cannot win with only finite memory, not even if the other players are restricted to finite memory.
The rest of the subsection focuses on the proof of Theorem 5. We will need the following result from [4]:
Lemma 6**.**
(Theorem 3.1 in [4])* If is a finitely branching reachability game then there is an MD strategy that is optimal minimizing in every state (i.e., ).*
One challenge in proving Theorem 5 is that an optimal minimizing player MD strategy according to Lemma 6 is not necessarily winning for player , even for almost-sure reachability and even if player has a winning strategy. Indeed, consider the game in Figure 2, and add a new player state and transitions and . For the reachability objective , we then have , and the player MD strategy with is optimal minimizing. However, is not winning from w.r.t. the almost-sure objective . Instead the winning strategy is with .
By the following lemma (from [4]), player has for every state an -optimal strategy that needs to be defined only on a finite horizon:
Lemma 7**.**
(Lemma 3.2 in [4])* If is a finitely branching game with reachability objective then:*
[TABLE]
where denotes the event of reaching within at most steps.
Towards a proof of item (1) of Theorem 5, we prove the following lemma:
Lemma 8**.**
Let be a finitely branching game with reachability objective . Suppose that player does not have any value-decreasing transitions. Then there exists a player MD strategy that is optimal in all states. That is, for all states and for all player strategies we have .
Proof.
In order to construct the claimed MD strategy , we define a sequence of modified games in which the strategy of player is already fixed on a finite subset of the state space. We will show that the value of any state remains the same in all the , i.e., for all . Fix an enumeration that includes every state in infinitely often. Let .
Given we construct as follows. We use Lemma 7 to get a strategy and s.t. . From the finiteness of and the assumption that is finitely branching, we obtain that is finite. Consider the subgame with finite state space . In this subgame there exists an optimal MD strategy that maximizes the reachability probability for every state in . In particular, achieves the same approximation in as in , i.e., . Let be the subset of states in with . Since is finite, there exist and with for all and all .
We now construct by modifying as follows. For every player state we fix the transition according to , i.e., only transition remains and all other transitions from are deleted. Since all moves from states in have been fixed according to , the bounds above for and now hold for and any . That is, we have and for all and all and all .
Now we show that the values of all states in are still the same as in . Since our games are weakly determined, it suffices to show that player has an -optimal strategy from in for every . Let be an arbitrary strategy from in . Let be a state and be an -optimal strategy from in . We now define a strategy from in . If the game does not enter then plays exactly as (which is possible since outside no transitions have been removed). If the game enters then it will reach the target from within with probability . Moreover, if the game stays inside forever then it will almost surely reach the target, since . Otherwise, it exits at some state (strictly speaking, at a distribution of such states). If this was the -th visit to then, from , plays an \epsilon\big{/}2^{k+1}-optimal strategy w.r.t. (with the same modification as above if it visits again). We can now bound the error of from as follows. The set of plays which visit infinitely often contribute no error, since they almost surely reach the target by . Since all transitions are at least value-preserving in and hence in , the error of the plays which visit at most times is bounded by \sum_{k=1}^{j}\epsilon\big{/}2^{k}. Therefore, the error of from in is bounded by and thus .
Finally, we can construct the player MD winning strategy as the limit of the MD strategies , which are all compatible with each other by the construction of the games . We obtain for all . Let . Since holds for infinitely many , we conclude Thus as required. ∎
Towards a proof of items (2) and (3) of Theorem 5, we consider the operation , defined before the statement of Theorem 5. The following lemma shows that in reachability games all value-increasing transitions of player can be removed without changing the value of any state (although the outcome of the threshold reachability game may change in general).
Lemma 9**.**
Let be a finitely branching reachability game and . Then for all we have . Thus .
Proof.
Since only transitions are removed, we trivially have . For the other inequality observe that the optimal minimizing strategy of Lemma 6 never takes any value-increasing transition and thus also guarantees the value in . Thus also . ∎
Lemma 9 is in sharp contrast to Example 1 on page 1, which showed that the removal of value-decreasing transitions can change the value of states and can cause further transitions to become value-decreasing.
Similar to the proof of Theorem 2, the proof of the following lemma considers a transfinite sequence of subgames, where each subgame is obtained by removing the value-decreasing transitions from the previous subgames.
Lemma 10**.**
Let be a finitely branching game with reachability objective . Then there exist a player MD strategy and a player MD strategy such that for all states , if or , then the following is true:
[TABLE]
Proof.
We construct a transfinite sequence of subgames , where is an ordinal number, by stepwise removing certain transitions. Let denote the set of transitions of the subgame .
First, let . Since is assumed to have no dead ends, it follows from the definition of that does not contain any dead ends either. In the following, we only remove transitions of player . The resulting games with may contain dead ends, but these are always considered to be losing for player . (Formally, one might add a dummy loop at these states.) For each we define a set as the set of transitions that are controlled by player and that are value-decreasing in . For any we define .
Since the sequence of sets is non-increasing and we assumed that our game has only countably many states and transitions, it follows that this sequence of games converges at some ordinal where (the first uncountable ordinal). I.e., we have . In particular there are no value-decreasing player transitions in , i.e., .
The removal of transitions of player can only decrease the value of states, and the operation is value preserving by Lemma 9. Thus for all . We define the index of a state by , and as if the set is empty.
Strategy : Since does not have value-decreasing transitions, we can invoke Lemma 8 to obtain a player MD strategy with for all and for all with . We show that, if and either or , then also in we have . The only potential difference in the game on is that could take a transition, say , that is present in but not in . Since all transitions of are kept in , such a transition would have been removed in the step . We show that this is impossible.
For the first case suppose that satisfies and . It follows . Since does not have value-decreasing transitions, we have , hence , so the transition is not value-increasing in . Hence the transition is present in , hence also in .
For the second case suppose . Since does not contain any value-increasing transitions, the transition is not value-increasing in . So it is present in , and thus also in .
It follows that under the play remains in the states of and only uses transitions that are present in , regardless of the strategy . In this sense, all plays under on coincide with plays on . Hence .
Strategy : It now suffices to define a player MD strategy so that we have for all and for all with . This strategy is defined as follows.
- •
If then where is an arbitrary but fixed successor of where transition is present in and and . This exists by the assumption that is finitely branching and the definition of . In particular, since the transition is present in , it is not value-increasing in the game ; otherwise it would have been removed in the step from to .
- •
If , plays the optimal minimizing MD strategy on from Lemma 6, i.e., we have where is an arbitrary but fixed successor of in with .
Considering both cases, it follows that strategy is optimal minimizing in .
Let be an arbitrary state with . To show that holds for all , let be any strategy of player . Let be the smallest index among the states that can be reached with positive probability from under the strategies . Let be such a state with index . In the following we write also for the strategy after a partial play leading from to has been played.
Suppose that the play from under the strategies always remains in . Strategy might not be optimal minimizing in in general. However, we show that it is optimal minimizing in from all states with index . Let be a state with index . By definition of we have where the transition is present in with and . In the case where this directly implies that the step is optimal minimizing in . The remaining case is that . Here, by definition of the index, and . Since the transition is present in , it is also present in and . Since , this transition is not value-increasing in . Also, it is not value-decreasing in , because it is a transition. Therefore , and thus . Also in this case the step is optimal minimizing in .
So the only possible exceptions where strategy might not be optimal minimizing in are states with index . Since we have assumed above that such states cannot be reached under , it follows that .
Now suppose that the play from under , with positive probability, takes a transition, say , that is not present in . Then this transition was value-decreasing for some game with : that is, . Since the indices of both and are , we have . Hence the transition is value-decreasing in . Since is optimal minimizing in , we also have .
Since is optimal minimizing in , we conclude that we have . ∎
We are now ready to prove Theorem 5.
Proof of Theorem 5.
Let be a finitely branching game with reachability objective . Let be an arbitrary initial state.
Suppose . Then player wins with the MD strategy from Lemma 6.
Suppose . Let . By Lemma 7 there are a strategy and such that holds for all . The strategy plays on the subgame with state space , which is finite since is finitely branching. Therefore, there exists an MD strategy with . Since , the strategy also applies in , hence . By combining the mentioned inequalities we obtain that holds for all . So the MD strategy is winning for player .
It remains to consider the case . Let us discuss the four cases from the statement of Theorem 5 individually.
- (4)
If then player wins with the MD strategy from Lemma 6.
So for the remaining cases it suffices to consider the threshold objective .
- (1)
If player does not have value-decreasing transitions then player wins with the MD strategy from Lemma 8. 2. (2)
If player does not have value-increasing transitions then Lemma 10 supplies either player or player with an MD winning strategy. 3. (3)
If then, again, Lemma 10 supplies either player or player with an MD winning strategy.
This completes the proof of Theorem 5. ∎
IV-B Büchi and co-Büchi Objectives
Let be the Büchi objective. (The co-Büchi objective is dual.) Quantitative Büchi objectives with are not strongly determined, not even for finitely branching games (Theorem 3), but positive probability and almost-sure Büchi objectives are strongly determined (Theorem 2).
However, objectives are not strongly FR-determined, even in finitely branching systems. Even in the special case of finitely branching MDPs (where player is passive and the game is trivially strongly determined), player may require infinite memory to win [18].
In infinitely branching games, the almost-sure Büchi objective is not strongly FR-determined, because it subsumes the almost-sure reachability objective; cf. Subsection IV-A.
In contrast, in finitely branching games, the almost-sure Büchi objective is strongly MD-determined, as the following theorem shows:
Theorem 11**.**
Let be a finitely branching game with objective . Then there exist a player MD strategy and a player MD strategy such that for all states :
[TABLE]
Hence finitely branching almost-sure Büchi games are strongly MD-determined.
For the proof we need the following lemmas, which are variants of Lemmas 6 and 8 for the objective , which is defined as:
[TABLE]
The difference to is that requires a path to that involves at least one transition.
Lemma 12**.**
Let be a finitely branching game with objective . Then there is an MD strategy that is optimal minimizing in every state.
Proof.
Outside , the objectives and coincide, so outside , the MD strategy from Lemma 6 is optimal minimizing for . Any with must have a transition with and , where the value is always meant with respect to . Set . Then is optimal minimizing in every state, as desired. ∎
Lemma 13**.**
Let be a finitely branching game with objective . Suppose player does not have value-decreasing transitions. Then there is an MD strategy that is optimal maximizing in every state.
Proof.
Outside , the objectives and coincide, so outside , the MD strategy from Lemma 8 is optimal maximizing for . Any must have a transition with or , where the value is always meant with respect to . Set . Then is optimal maximizing in every state, as desired. ∎
With this at hand, we prove Theorem 11.
Proof of Theorem 11.
We proceed similarly to the proof of Theorem 2. In the present proof, whenever we write for a subgame of , we mean the value of state with respect to , where is the state space of .
In order to characterize the winning sets of the players with respect to the objective , we construct a transfinite sequence of subgames of , where is an ordinal number, by stepwise removing certain states, along with their incoming transitions. Let denote the state space of the subgame . We start with . Given , define as the set of states with , and for any define as the set of states s\in\big{(}S_{\alpha}\setminus\bigcup_{j=0}^{i}D_{\alpha}^{j}\big{)}\cap(S_{\Diamond}\cup S_{\bigcirc}) that have a transition with . The set can be seen as the backward closure of under random transitions and transitions controlled by player . For any we define .
Since the number of states never increases and is countable, it follows that this sequence of games converges at some ordinal where (the first uncountable ordinal). That is, we have .
As in the proof of Theorem 2, some games may contain dead ends, which are always considered to be losing for player . However, does not contain dead ends. (If is empty then player loses.) We define the index, , of a state as the ordinal with , and as if such an ordinal does not exist. For all states we have:
[TABLE]
In particular, player does not have value-decreasing transitions in . We show that states with are in {\big{[}\mathtt{B\ddot{u}chi}({\mathcal{T}\,\,\!\!})\big{]}_{\Diamond}^{{<1}}}_{\!\!{\mathcal{G}}}, and states with are in {\big{[}\mathtt{B\ddot{u}chi}({\mathcal{T}\,\,\!\!})\big{]}_{\Box}^{{=1}}}_{\!\!{\mathcal{G}}}, and in each case we give the claimed witnessing MD strategy.
Strategy : We define the claimed MD strategy for all with as follows. For all , define as in the MD strategy from Lemma 12 for and . For all for some , define such that and .
In each , strategy coincides with the strategy from Lemma 12, except possibly in states with . It follows that is optimal minimizing for all with .
We show by transfinite induction on the index that holds for all states with and for all player strategies . For the induction hypothesis, let be an ordinal for which this holds for all states with . For the inductive step, let be a state with , and let be an arbitrary player strategy in .
- •
Let . Suppose that the play from under the strategies always remains in , i.e., the probability of ever leaving under is zero. Then any play in under these strategies coincides with a play in , so we have . Since is optimal minimizing in , we have . Since , we have . By combining the mentioned equalities and inequalities we get , as desired.
Now suppose otherwise, i.e., the play from under , with positive probability, enters a state , hence . By the induction hypothesis we have for any . Since the probability of entering is positive, we conclude , as desired.
- •
Let for some . It follows from the definitions of and of that induces a partial play of length from to a state (player does not play on this partial play). We have shown above that . It follows that , as desired.
We conclude that we have for all and all with .
Strategy : We define the claimed MD strategy for all with to be the MD strategy from Lemma 13 for and . This definition ensures that player never takes a transition in that leaves . Random transitions and player transitions in never leave either: indeed, if with then for some , hence if and then . We conclude that starting from all plays in remain in , under and all player strategies.
Let , hence . Let be any player strategy. Since is optimal maximizing in , we have . As argued above, is not left even in , hence .
Therefore holds for all and all . Since Büchi is repeated reachability, we also have for all and all with . ∎
V Conclusions and Open Problems
With the results of this paper at hand, let us review the landscape of strong determinacy for stochastic games. We have shown that almost-sure objectives are strongly determined (Theorem 2), even in the infinitely branching case.
Let us review the finitely branching case. Quantitative reachability games are strongly determined [18, 4, 5]. They are generally not strongly FR-determined [19], but they are strongly MD-determined under any of the conditions provided by Theorem 5. Almost-sure reachability games and even almost-sure Büchi games are strongly MD-determined (Theorems 5 and 11). Almost-sure co-Büchi games are generally not strongly FR-determined [18], even if player is passive, because player may need infinite memory to win. However, the following question is open: if a state is almost-surely winning for player in a co-Büchi game, does player also have a winning MD strategy?
The same question is open for infinitely branching almost-sure reachability games (these games are generally not strongly FR-determined either [19]). In fact, one can show that a positive answer to the former question implies a positive answer to the latter question.
Acknowledgements. This work was partially supported by the EPSRC through grants EP/M027287/1, EP/M027651/1, EP/P020909/1 and EP/M003795/1 and by St. John’s College, Oxford.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P. A. Abdulla, L. Clemente, R. Mayr, and S. Sandberg. Stochastic parity games on lossy channel systems. Logical Methods in Computer Science , 10(4:21), 2014.
- 2[2] P. Billingsley. Probability and Measure . Wiley, New York, NY, 1995. Third Edition.
- 3[3] T. Brázdil, V. Brožek, K. Etessami, A. Kučera, and D. Wojtczak. One-counter Markov decision processes. In SODA’10 , pages 863–874. SIAM, 2010.
- 4[4] T. Brázdil, V. Brožek, A. Kučera, and J. Obdrzálek. Qualitative reachability in stochastic BPA games. Information and Computation , 209, 2011.
- 5[5] V. Brožek. Determinacy and optimal strategies in infinite-state stochastic reachability games. TCS , 493, 2013.
- 6[6] K. Chatterjee, L. de Alfaro, and T. Henzinger. Strategy improvement for concurrent reachability games. In QEST , pages 291–300. IEEE Computer Society Press, 2006.
- 7[7] K. Chatterjee, M. Jurdziński, and T. Henzinger. Simple stochastic parity games. In CSL’03 , volume 2803 of LNCS , pages 100–113. Springer, 2003.
- 8[8] K. Chatterjee, M. Jurdziński, and T. A. Henzinger. Quantitative stochastic parity games. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’04, pages 121–130, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics.
