Solving Simple Stochastic Games with few Random Nodes faster using Bland's Rule
David Auger, Pierre Coucheney, Yann Strozecki

TL;DR
This paper introduces a faster algorithm for solving simple stochastic games with few random nodes by adapting Bland's rule, reducing randomness and improving expected running time.
Contribution
It presents a simplified iterative algorithm using Bland's rule, achieving exponential speed-up for games with limited random nodes.
Findings
Expected running time of 2^{O(k)} for k random nodes
Reduced randomness compared to Ludwig's algorithm
Applicable to general random nodes with arbitrary outdegree
Abstract
The best algorithm so far for solving Simple Stochastic Games is Ludwig's randomized algorithm which works in expected time. We first give a simpler iterative variant of this algorithm, using Bland's rule from the simplex algorithm, which uses exponentially less random bits than Ludwig's version. Then, we show how to adapt this method to the algorithm of Gimbert and Horn whose worst case complexity is , where is the number of random nodes. Our algorithm has an expected running time of , and works for general random nodes with arbitrary outdegree and probability distribution on outgoing arcs.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
DAVID laboratory, University of Versailles Saint-Quentin-en-Yvelines, [email protected] laboratory, University of Versailles Saint-Quentin-en-Yvelines, [email protected] DAVID laboratory, University of Versailles Saint-Quentin-en-Yvelines, [email protected] \CopyrightDavid Auger and Pierre Coucheney and Yann Strozecki\ccsdesc[500]Theory of computation Algorithmic game theory \supplement\funding
Acknowledgements.
\EventEditorsRolf Niedermeier and Christophe Paul \EventNoEds2 \EventLongTitle36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019) \EventShortTitleSTACS 2019 \EventAcronymSTACS \EventYear2019 \EventDateMarch 13–16, 2019 \EventLocationBerlin, Germany \EventLogo \SeriesVolume126 \ArticleNo58
Solving Simple Stochastic Games with few Random Nodes faster using Bland’s Rule
David Auger
Pierre Coucheney
Yann Strozecki
Abstract
The best algorithm so far for solving Simple Stochastic Games is Ludwig’s randomized algorithm [21] which works in expected time. We first give a simpler iterative variant of this algorithm, using Bland’s rule from the simplex algorithm, which uses exponentially less random bits than Ludwig’s version. Then, we show how to adapt this method to the algorithm of Gimbert and Horn [15] whose worst case complexity is , where is the number of random nodes. Our algorithm has an expected running time of , and works for general random nodes with arbitrary outdegree and probability distribution on outgoing arcs.
keywords:
simple stochastic games, randomized algorithm, parametrized complexity, strategy improvement, Bland’s rule
category:
\relatedversion
1 Introduction
A simple stochastic game, SSG for short, is a two-player zero-sum game, a turn-based version of stochastic games introduced by Shapley [22]. SSGs were introduced by Condon [11] and provide a simple framework that allows to study algorithmic complexity issues underlying reachability objectives. An SSG is played by moving a pebble on a graph. Some nodes are divided between players min and max: if the pebble reaches a node controlled by a player then she has to move the pebble along an arc leading to another node. Some other nodes are ruled by chance, the pebble following one outgoing arc according to some given probability distribution. Finally, there are sink nodes with a rational value, which is the gain that max-player achieves when the pebble reaches this sink.
Player max’s objective is, given a starting node for the pebble, to maximize the expectation of her gain against any strategy of min. One can show that it is enough to consider stationary deterministic strategies for both players [11]. Though seemingly simple since the number of stationary deterministic strategies is finite, the task of finding a pair of optimal strategies, or equivalently, of computing the so-called optimal values of nodes, is in complexity class [13] but not known to be in .
Simple stochastic games are a powerful model since they can simulate many other games such as parity games, mean or discounted payoff games [2, 7]. However these games are believed to be simpler than SSGs and better algorithms are known for them; in particular, parity game is in quasi-polynomial time [5]. Stochastic versions of the previous games also exist and are computationally equivalent to SSGs [2]. Interestingly, SSGs have many application domains, for instance autonomous urban driving [9], smart energy management [8], model checking of the modal -calculus [23], etc.
There are some restrictions for SSGs for which the problem of finding optimal strategies is tractable. If the game is acyclic, it can be solved in linear time, and in polynomial time for almost acyclic games (few cycles or small feedback arc sets) [3]. If there is no randomness, the game can be solved in almost linear time [1]. Furthermore, Gimbert and Horn were the first to extend this result by giving Fixed Parameter Tractable (FPT) algorithms in the number of random nodes [15]. They indeed show that optimal strategies depend only on the ordering of the values of random nodes, and not on their actual values. Using this idea, they devise two algorithms. The first one exhaustively enumerates these orders until it finds one that actually corresponds to optimal values. The second one is a strategy improvement algorithm based on an iterative refinement of the orders. Both have a complexity of , where is the number of random nodes. It has been improved to expected time in [12], by randomly selecting a good strategy as a starting point for a strategy improvement algorithm. In fact, as remarked in [6], the distance between the values of two consecutive strategies in any strategy improvement algorithm depends on the number of random nodes. Hence any SSG can be solved in time (in fact using Lemma in [3]). The complexity has been further improved to in [19], by using a value iteration algorithm. Here a bit of caution is in order; in some papers, random nodes can have an arbitrary outdegree and probability distribution on outgoing arcs, and in some other they must be binary with uniform distribution. In the former case, if we denote by the bit-size of the largest probability distribution on a random node, the first two cited algorithms have a complexity of and . On the other hand, the two algorithms with an exponential complexity in have an exponential dependency on when adapted to this context.
Without the previous restrictions, only algorithms running in exponential time are known. Most of them are strategy improvement algorithms, which produce a sequence of strategies of increasing values. These algorithms, such as the classical Hoffman-Karp [18] algorithm, rely on the switch operation, which by a local best-response, produces a strategy with better value. Several ways of choosing the nodes that are switched have been proposed [24], which can be compared to the rules for pivot selection for the simplex algorithm in linear programming. Though efficient in practice, these algorithms fail to run in polynomial time on a well designed input [14]. The best algorithm so far, proposed by Ludwig [21, 16], is also a strategy iteration algorithm using a randomized version of Bland’s rule [4] to choose a switch. It solves any SSG in expected time . The first analysis of this kind of algorithm is due to Kalai [20] and it has been slightly improved recently [17].
Our contributions
In Sec. 3, we present an iterative variant of Ludwig’s recursive algorithm which uses less random bits. In the rest of the paper we adapt the idea of this algorithm to carefully enumerate orders of random nodes in an SSG. First, in Sec. 4, we present a pivot operation yielding a strategy improvement algorithm, which improves the one of [15]. This pivot operation comes from a randomized dichotomy on all orders that we explain in details in Sec. 5, using an auxilliary game similar to the one of [12]. We prove that our algorithm finds the optimal strategies in expected time polynomial in and , where is the number of random nodes and is the maximum bit-length of a distribution on a random node, answering positively a question of Ibsen-Jensen and Miltersen [19].
2 Definitions and classic results on simple stochastic games
We here review definitions and results related to SSGs. We only sketch what we need and refer to longer expositions such as [11, 24] for more details.
Definition 2.1** **(SSG)
A simple stochastic game (SSG) is defined by a directed graph , where is the set of nodes and the set of arcs, together with a partition of in four parts , , and , whose elements are respectively called max-nodes, min-nodes, ran-nodes (for random) and sinks. We require that every node has outdegree at least one, while sink nodes have outdegree exactly consisting of a single loop on themselves. We also specify for every sink a value which is a rational number, and for every random node a rational probability distribution on the outneighbours of .
In the original version of Condon [11], all nodes except sinks have outdegree exactly two, the probability distribution on every ran-node is , and there are only two sinks, one with value [math] and another with value . Here, we allow more than two sinks, with general rational values, and also allow more than outdegree two for all non-sink nodes, with an arbitrary probability distribution for ran-nodes. However, for Ludwig’s Algorithm (see Algorithms 1, 2 and 3 in section 3) we shall suppose that all max-nodes have outdegree and call such games max*-binary*.
Strategies and values
We now define strategies, by which we mean stationary and pure strategies. This is enough for our purpose and it turns out to be sufficient for optimality, see [11]. Such strategies specify the choice of a neighbour for every node of a given player.
Definition 2.2** **(Strategy)
A strategy for player max is a map from to such that
Strategies for player min are defined analogously on min-nodes and are usually denoted by .
Definition 2.3** **(play)
A play is a sequence of nodes such that for all , Such a play is consistent with strategies and , respectively for player max and player min, if for all , and
A couple of strategies and an initial node define recursively a random play consistent with by setting if , if , if , and finally is one of the outneighbours of , randomly chosen independently of everything else according to probability distribution , if .
Hence, this defines a probability measure on plays consistent with . Note that if a play contains a sink node , then at every subsequent time the play stays in . Such a play is said to reach sink . To every play we associate a value which is the value of the sink reached by the play if any, and [math] otherwise. If we denote by this value, then is a random variable once two strategies and an initial node are fixed. We are interested in the expected value of this quantity, which we call the value of a node under strategies : where is the expected value under probability .
The goal of player max is to maximize this (expected) value, and the best he can ensure against a strategy is where the maximum is considered over all max-strategies (which are in finite number). Similarly, against player min can ensure that the expected value is at most
Finally, the value of a node is The fact that these two quantities are equal is nontrivial, and it can be found for instance in [11]. A pair of strategies such that, for all nodes , always exists and these strategies are said to be optimal strategies. It is polynomial-time equivalent to compute optimal strategies or to compute the values of all nodes in the game.
Definition 2.4** **(Stopping SSG)
An SSG is said to be stopping if for every couple of strategies almost all plays eventually reach a sink node.
Usually, this condition is required in order to ensure simple optimality conditions (Thm. 2.5 below). Condon [11] proved that every SSG can be reduced in polynomial time to a stopping SSG whose size is quadratic in the size of , and whose values almost remain the same. The values of the new game are close enough to recover the values of the original game. A problem for us is that squaring the size of the game does not behave well relatively to precise complexity bounds.
However, in our case we need a milder condition. We call a max-strategy stopping if, for any min-strategy , the random play consistent with reaches a sink with probability one.
Theorem 2.5** **(Optimality conditions, [11])
Let be an SSG, a stopping max-strategy and a min-strategy. Then are optimal strategies if and only if
- •
for every , ;
- •
for every , .
Switches and strategy improvement
Consider the usual partial order on real vectors indexed by , i.e. for , denote if for all , and denote if and at least one inequality is strict. For two max-strategies , simply denote (resp. ) if (resp. ). Define a similar order on min-strategies.
A switch, given a strategy, is the fact of changing this strategy at a node (or a set of nodes) in order to obtain a new one.
Definition 2.6
Let be max-strategies. We say that is a profitable switch of if for all , one has with this condition strict for at least one max-node (such a node is said to be switchable).
Indeed, the following result states that such a switch actually improves values
Theorem 2.7** **([10], [24])
If is a profitable switch of , then .
Before ending this section, please note that Th. 2.5 can be restated in terms of nonexistence of switchable node. Hence, we have the following result:
Theorem 2.8
A stopping max-strategy is optimal if and only if it has no switchable nodes.
For the last section, we require another form of switch.
Theorem 2.9** **([10], [24])
Let be stopping max-strategies and be min-strategies such that for all , and for all , with one of these conditions strict for at least one node. Then
Orders
For consider the set of integers and let denote the set of total orders on . For sake of clarity we view these orders as sets of couples satisfying reflexivity, transitivity and antisymmetry.
If , it can also be described in ascending ordering such as where if and only if . An interval in is a sequence of consecutive elements in ascending ordering. The rank of an element is the number of elements that are lower of equal to in , i.e. it is if with notation above.
For lack of a better word, we define a pretotal order as an antisymmetric and reflexive relation and denote by the set of pretotal orders on . If and is such that is still antisymmetric, we denote simply by this new pretotal order.
If and are real numbers, we say that the ’s are nondecreasing along if . Likewise, we say that is a nondecreasing order for .
3 Iterative formulation of Ludwig’s algorithm
In this part, we suppose that is max-binary. Hence, if a node is switchable there is a single possibility for changing the strategy’s choice at this node. Let denote the profitable switch obtained from by switching at node .
3.1 Bland’s rule version
In [21], Ludwig mentions that his algorithm is a version of Bland’s rule, however he does not make it explicit and gives a recursive definition. We formulate his algorithm iteratively (see Algorithm 1), and show that instead of randomly choosing a node at every step, we can choose a total order on nodes prior to the execution of the algorithm. This version uses much less random bits : bits instead of in average in Ludwig’s version.
By Theorems 2.7 and 2.8 if we proceed by switching Strategy until there are no more switchable nodes, we reach an optimal strategy in a finite number of steps. The number of steps is at most the number of max-strategies, i.e. . However, we have the following:
Theorem 3.10
The expected number of strategies considered by Alg. 1 is at most .
3.2 Analysis of Algorithm 1
Our strategy to prove Theorem 3.10 is to reformulate Alg. 1 as a recursive algorithm (see Alg. 3), which is close to Ludwig’s algorithm in [21]. The proofs are quite similar to Ludwig’s, with a bit of caution on the moments where random choices are made. In particular, we detail our strategy in this part since it will be helpful to understand our results in section 4 where the context is more involved.
Stated as above, it is perhaps unclear how Alg. 1 has a recursive structure. Too see this, consider an execution of Alg. 1, and let be the last max-node in the order . In the beginning, the current strategy makes an initial choice on , which does not change until the first time when becomes switchable (if this happens). If is switched, then will then remain unchanged until the end of this algorithm. Hence, once is fixed, we can think of this execution as two parts, where is fixed in each part. These can then be decomposed as subparts where and are fixed (where is the second-to-last max-node in order ), and so on.
Generalization to partially fixed strategies
To formalize the discussion above, we give a generalization which can be applied to the case where is fixed for some vertices in a given set (see Alg. 2).
In the following, if is a set of max-nodes and is a max-strategy, a -compatible strategy is any max-strategy such that , For and fixed, there is always a -strategy that is better than all others. It can be obtained by solving the game where any is replaced by a random node with a probability to go to . We call such a -compatible strategy optimal and we denote it by . In particular, an optimal -strategy is an optimal strategy for , whereas is the only -compatible strategy.
Recursive reformulation
Finally, we give a recursive version of Alg. 2 (see Alg. 3) which we use to derive the bound. The equivalence between these two algorithms should be clear by the previous explanations.
Evaluating the number of switches
Let be the total number of switches performed by Algorithm 3 on input . We consider for the following lemma an execution of this algorithm.
Lemma 3.11
Let be the initial strategy and be the last node which is not in , according to order . Define to be the set of nodes such that Then where is , switched at .
Proof 3.12
By design of the algorithm, nodes of are never switched. If is never switched, then we have
[TABLE]
hence the result is true.
Suppose from now on that is switched during the execution. Then, it is switched only once and we can divide the computation of in two parts:
- •
in a first part the algorithm computes , and this last strategy is switched at , hence obtaining ;
- •
then in a second part the algorithm computes .
Hence, we have in the case that is switched,
[TABLE]
It remains to see that in the second part, nodes from will never be switched, so that , hence the result.
Let be a strategy obtained during the second part of the algorithm (after is switched) such that for a . On the one hand, by definition of we have
[TABLE]
On the other hand, since is obtained after , then
[TABLE]
By transitivity we see that
[TABLE]
hence .
Therefore, in the second part of the algorithm, all strategies satisfy for all , hence nodes in have been switched in the first part and never will be in the second.
Now, let us denote where the supremum is considered over all SSG with max-nodes and all max-strategies in . The average is considered over all possible prior choices of order , the rest of the algorithm being deterministic.
Lemma 3.13
For all ,
We shall first need the following result.
Lemma 3.14
Let be a partially ordered set and define for any , i.e. the number of elements that are not greater than . Then for all , we have
[TABLE]
Proof 3.15
Fix and consider the set of with . Let be maximal among elements of . Since there are at least elements in that are strictly greater than , and that these elements are not in by maximality of , we have i.e. .
Proof 3.16
First, denote for and
[TABLE]
where the supremum is considered over all SSG with max-nodes, subsets of size and all max-strategies .
Consider and fixed, with and . Using notation of Lemma 3.11, we have
[TABLE]
Here, we denote and by and to stress the fact that these are random variables depending on , whereas everything else ( i.e. ) is fixed.
First, since for all ,
[TABLE]
we have
[TABLE]
Now,
[TABLE]
The sum on the right can easily be rewritten as
[TABLE]
where we defined for convenience , and used the fact that .
Using now Lemma 3.14 on the set of strategies for , we see that
[TABLE]
since is uniformly chosen in .
So, since , we deduce that
[TABLE]
In order to conclude and prove Theorem 3.10, we now just have to infer the bound for sequences satisfying the conclusion of Lemma 3.13.
Lemma 3.17** **(Lemma of [21])
*Let be such that and for all ,
Then for all , *
4 Simple stochastic games with few random nodes
The idea that in an SSG, the optimal strategies depend only on the ordering of the values of ran-nodes, and not on their actual values, has been introduced by Gimbert and Horn in [15]. Their main idea is that, if one gives an ordering of ran-nodes such that is nondecreasing with , then max will try to reach a node with as high as possible, whereas min will try to minimize this index; this idea is hereafter formalized by the notion of forcing sets and forcing strategies (sec. 4.1). Gimbert and Horn use this fact to derive an algorithm that will enumerate all possible orders on ran-nodes an will identify one with the property mentionned above, yiedling the optimal strategies and values for
The algorithm that we describe and analyse in the rest of this paper (Alg. 4) uses the same principle, but iterates through orders in a special way, similarly to the iteration through strategies made by Ludwig’s algorithms (see sec. 3). We will derive a similar bound for the average number of iterations of this randomized algorithm. Hence, our main algorithm is still a variation on Bland’s rule for pivot selection. The difficulty here does not lie in the proof of the bound, but in the description of the technique used to iterate on orders.
In [15], the game remains the same during the execution of the algorithm, but we proceed differently:
- •
in section 4.1, we describe how to associate to every total order a new SSG , and we show that this game can be solved in polynomial time.
- •
in section 4.2, we prove that there is an optimal order such that the optimal values of give directly the optimal values of ; it is also the order that maximises values of among all total orders . If an order is not optimal, we describe a pivot operation yielding from a new order such that the optimal values of improve those of .
- •
the proof of the bound will be derived in section 5.
4.1 Modified game and forcing strategies
We need to assume that the games we consider enjoy some basic properties in order to describe our algorithm without considering too many special cases.
Definition 4.18
*An SSG is in canonical form (CF) if max has a stopping strategy and only ran-nodes can have an outgoing arc to a sink. *
To ensure these conditions, one can first in linear time find and remove all nodes from which min player can force the game never to reach neither a sink node nor a ran-node (see e.g. [1, 11]). These nodes have value [math] and can as well be removed from the game. Then, all probabilities on ran-nodes are modified by giving them a very small probability to go to a sink. One can prove as in [11] that values remain almost the same. The second condition ensures that all max and min nodes have to reach a ran-node in order to reach a sink. It can be done by adding a dummy random node before every sink.
In all that follows we suppose that is an SSG in CF with random nodes . Let be a total order on . We define a game as follows (the same construction is presented in [12]). Start with a copy of . For every , add a min-node denoted to , which we call control node; add an arc ; for every arc , remove this arc and add an arc ; finally, for every , , add the arc to .
So basically, every control node intercepts all arcs entering in (see Fig. 1), and has an arc to every other control node which is greater than in . In the game , the set of sinks, max-nodes and ran-nodes remain the same as in , whereas the set of min-nodes will be denoted , where is the set of min-nodes in . This allows us to directly identify max-strategies in and in , and to identify projections onto of min-strategies in , to min-strategies in .
Now, suppose we remove first all sinks and random nodes of , and then turn every control node into a sink with a value equal to its rank in . This transformation clearly turns into a game without random nodes.
Definition 4.19** **(Forcing strategy)
By identifying strategies in and , we say that any optimal strategy for max or min in is a * -forcing strategy of .*
In -forcing strategies, the players try to ensure the reaching of a control node as high as possible for max, and as low as possible for min, in the order . We refer to [1] and [15] for more details about how one can compute these optimal strategies in linear time, using the so-called deterministic attractors.
Definition 4.20** **(Forcing set)
For any control node , define the forcing set for , denoted , as the set of max and min-nodes that reach if the game is played with a couple of -forcing strategies (forcing sets are independant of the choice of the strategies as long as they are -forcing).
An example of an SSG turned into a modified SSG and of computation of forcing strategies is presented in Fig. 2.
Here are basic properties on which should explain why we consider this game.
Lemma 4.21
- (i)
if has stopping max-strategy, so does ; 2. (ii)
optimal values of control nodes in are nondecreasing along ; 3. (iii)
optimal strategies in coincide with forcing strategies for order on ; 4. (iv)
the game can be solved in polynomial time.
Proof 4.22
To see why is true, just note that since is an antisymmetric relation, this does not create new cycles among min-nodes.
Suppose now that . By optimality for the min player, and since is stopping, is the minimum value of for all outneighbours of (see Th. 2.5). Since is an outneighbour of in , we have . Hence is true.
Now, consider replacing in every control node by a new sink with value . Clearly the values of this new game remain the same. But, by construction of , random nodes have no incoming arcs and they could be as well removed without changing the optimal values on . By reducing the game in this way, we get a deterministic game whose optimal values on are the same as those of . By definition, optimal strategies of this game are -forcing strategies, hence is true.
Finally, to solve we can choose a couple of -forcing strategies and search for optimal strategies in that match with on . Hence, the strategy of all max-nodes is fixed, and only min-strategies on control nodes are computed by solving a one player SSG. It can be done in polynomial time by linear programming (see [11]).
As explained in the proof above, to solve , it is enough to compute -forcing strategies on , which can be done in linear time, and then to solve a one player SSG with only nodes.
4.2 Value intervals and pivot
In what follows, we write for the vector of optimal values of .
Definition 4.23** **(Constrained control node)
We say that a control node is constrained in if .
Constrained control nodes are similar to switchable nodes in SSG. In fact, we can characterize optimality of an order by the absence of constrained node as follows.
Lemma 4.24** **(Optimal order)
Let . The game does not have any constrained control nodes if and only if the forcing strategies are optimal strategies for . In this case we say that is an optimal order for .
Proof 4.25
First note that since is in CF, is always stopping.
If does not have any constrained control nodes, then optimal strategies are the forcing strategies on , together with the choice for each control node . Then, by merging the control nodes with their associated random node while removing the unused arcs between the control nodes (hence recovering the initial game G), the values on the remaining nodes are kept, and so are the optimality conditions of Th. 2.5.
If are optimal strategies for , then the values of the ran-nodes are nondecreasing along order . Hence, by turning into and extending strategies with the choice for each control node , we will obtain values that satisfy optimality conditions and such that , showing that is not constrained.
We define the value interval of a control node as the set of that share the same optimal value in , i.e. . This set is indeed an interval in order by of Lemma 4.21, i.e. its elements are consecutive in order .
Definition 4.26
The pivot operation on a control node for the order is the transformation of into a new order , obtained by moving just after the end of its value interval in .
Note that if is the last node of its value interval, then the pivot operation does nothing. Also note that if is constrained, it cannot be the last node of its value interval (we shall only pivot on constrained control nodes).
Example. Let and let be in ascending order . Suppose that the values of control nodes are, in this order . The value intervals are , and . The pivot operation on places after , so that the obtained order would be .
The following theorem shows that the pivot operation increases the value vector, which will enable us to design a strategy improvement algorithm on the forcing strategies (where the improvement is on rather than on values in the original game ). A similar theorem is proved in [12] to build a different strategy improvement algorithm.
Theorem 4.27
Let and be a constrained control node. If is obtained from by pivoting on , then .
Proof 4.28
Consider a new game which is obtained from like and but with arcs for between control nodes for all . Let and be respective optimal strategies in and . We can interpret these strategies as strategies in . Since the only difference between , and are the arcs between control nodes, all strategies give exactly the same values in and in , and a similar observation can be made for . Hence, to prove the result, is is enough to show that in . Note that whereas and are respective stopping max-strategies of and , they could be not stopping in . However it is not difficult to see that conclusion of Th. 2.9 would still apply. Hence it is sufficient to show in that changing into makes a nondecreasing switch on every node, and is increasing in at least one node.
In the order , let , with be the increasing sequence of consecutive nodes sharing the same value as for (i.e. the value interval of , starting from ). Since is constrained, .
The pivot operation transforms this part of into , hence the only differences between and are the arcs for that are inverted into . Hence, if , it keeps the same position relatively to all other control nodes when we change the order into , hence . Hence, when we change to , either there is no switch in , or it is between nodes of the same values.
For nodes with , clearly (resp. ) is in some with . Since all these nodes also share the same value (by definition of the value interval), these switches are also between nodes of same values.
Suppose that there is a decreasing switch on a control node, i.e. for a we have . In this case should be stricly before in since optimal values are increasing along . So we could not have but should have . The only possibility is and . Since these nodes are in the same value interval, once again this switch is unchanging, a contradiction.
*We showed that no switch from to is decreasing. Now consider the case of during the pivot operation. Since is constrained, . Since can either be equal to or to some which is striclty after the value interval of in order , hence has a greater value, we see that the switch at must be increasing. *
4.3 Main algorithm
Algorithm 4 consists in iterating on orders , by picking randomly a pivotable element in and updating by a pivot on , until we reach an optimal order.
Here is the pivot selection rule. First, prior to the execution of the algorithm, we choose randomly and uniformly an order on the set of all unordered pairs of control nodes , with . Then, at each step of the algorithm, consider the game , and remove one by one the arcs between control nodes, following order . During this process, choose as pivot the first constrained control node, if any, which is disconnected from the following nodes of its value interval. In more detail, for a given order , compute and then partition the control nodes into value intervals. Each constrained control node has arcs leading to other control nodes from the same value interval, where is its distance in to the last element of this interval. Enumerating in ascending order, the pivot is the first constrained node whose arcs are encountered.
Example. Continued from the previous example with and value intervals , and . Suppose that the order starts The first element that is disconnected from its value interval is which is the one we choose as a pivot leading to order .
By Th. 4.27, no order is repeated during the execution of Algorithm 4; since is finite, the algorithm reaches in a finite number of steps an order which has no constrained node, i.e. which is optimal by Lemma 4.24. Hence, Algorithm 4 computes optimal strategies for in at most steps. However, we claim the following result, which will be proved in the next section.
Theorem 4.29
Alg. 4 computes optimal strategies for in at most expected steps.
Note that for large enough we have , whose growth is roughly equivalent to . Moreover, the algorithm uses random bits to choose the order on pairs.
5 Analysis of Algorithm 4
In this section we prove Theorem 4.29. To do this we shall reformulate Algorithm 4 as a recursive algorithm, but we need additional notions for this. The recursive formulation also reveals the nature of the algorithm: it computes an optimal order on control nodes by finding the right order between each pair of these nodes using dichotomy. This allows the same analysis as for Ludwig’s Algorithm and its variants.
5.1 Modified game for a pretotal order
If is a pretotal order, we define exactly as was defined for a total order in section 4.1. The only difference is that, since is not total, a control node only has arcs to those such that .
To simplify notation, for any node in , define as the optimal value of in . We can now directly extend some of the observations of Lemma 4.21 to pretotal orders.
Lemma 5.30
If , then optimal values of control nodes in are nondecreasing in order , i.e. if then .
In order to solve , the algorithm will recursively compute an optimal total ordering of control nodes extending . Thus, for all total orders extending , we need to assign a value in , which we denote . Here is how we define it.
Definition 5.31
Let extending . The values associated to in are the values where and satisfy:
- (i)
* and are forcing strategies for ;* 2. (ii)
* statisfies the min-optimality conditions (Thm. 2.5) on every control node .*
As a summary, is the vector of optimal values of game while is the vector of optimal values of when the strategies in and are forcing strategies. It follows that . Recall that is the vector of optimal values of . Then we have .
Definition 5.32** **(Optimal order)
Let and extending (). We say that is an optimal total order for if .
The next lemma proves the existence and gives a characterization of optimal orders.
Lemma 5.33
Suppose is in CF and let . Then the following conditions are equivalent:
- (i)
* is an optimal order for ;* 2. (ii)
* is a nondecreasing ordering of the values for ;* 3. (iii)
.
Proof 5.34** **(Proof of Lemma 5.33)
First, note that, by definition, optimality conditions are satisfied at control nodes in the definition of , so it is always true that values are nondecreasing along .
Suppose now that is optimal for , i.e. , and suppose that is not a nondecreasing ordering of the values for . Then there must be two consecutive in order such that , and we must have . Consider the order that we obtain from by inverting and . Clearly, this order also extends since the only inversion between and is . By an argument similar to the proof of Theorem 4.27, it is easy to obtain that , which contradicts optimality. Hence we proved .
Now suppose . Let be an optimal strategy of ; we show that optimality conditions are met in . First note that has the same values in and . For the nodes in , no new arcs are added so the optimality conditions are still satisfied. Now, consider control nodes. An arc cannot lead to a lower value for by assumption . Hence the optimality conditions are still satisfied on control nodes, strategies are optimal for , so finally and we proved .
*Since involves more arcs than between the control min-nodes, we have . Assume , then . Hence we proved . *
5.2 Recursive formulation
We now give Algorithm 5, a recursive formulation of Algorithm 4. We will prove that these two algorithms compute exactly the same sequence of total orders, and use the recursive formulation to derive a bound.
Definition 5.35
Let and such that is still a pretotal order. We say that the addition of to is constraining, or that is constrained, if .
When an arc is constrained, it is essential to the min-optimal strategy in ; in other words removing this arc would increase optimal values.
Lemma 5.36
Suppose is in CF and let , , . Then the following conditions are equivalent:
- (i)
* is an optimal ordering for ;* 2. (ii)
the addition of every arc to is not constraining;
Proof 5.37** **(Proof of Lemma 5.36)
Let . Since , we have .
Assume that is an optimal order for . If the addition of to is constraining, then which contradicts by Lemma 5.33. Hence we proved .
Assume that no arc in is constraining. Then, add sequentially arcs in to until we get , hence forming a sequence of games . If none of these arcs is used in then and is proved. Otherwise consider the first arc such that . This implies that . But since no constraining arc has been added until step . Hence , and finally is constraining for , a contradiction.
Consider a run of the recursive algorithm and let be a total order at any step of the run. Let us inspect the first time where is modified. Order will be optimal for a sequence of pretotal orders that are obtained from by removing one by one pairs in order as long as they are not constrained. This in fact amounts to ascending the recursive call tree. Let be the first constrained pair and the pretotal order obtained once is removed. Then is turned into by pivoting on node as we did in iterative Alg. 4. We show now that control node is same as the pivot selected by the pivot selection rule.
Note first that, during the process of removing the pairs one by one in order , the value intervals of are kept unchanged until the pretotal order is reached. Since removing implies an increase of the optimal values, it means that and were in the same value interval and that had no other neighbour in that interval. Note here that, as a consequence, is then guaranteed to be a pretotal order. Clearly, node is the first control node in that situation. So the choice of node exactly obeys the pivot selection rule.
Finally, the following lemma enables us to analyze the complexity of Algorithm 5.
Lemma 5.38
Let and such that and are pretotal orders and where the addition of to is constraining. Let be an optimal total order for , obtained from by pivoting in , and let be an optimal total order for .
Let such that Then for any total order obtained by Algorithm 5 between and (including those), one has .
Proof 5.39
Suppose that . Then hence .
On the other hand, since the pivot operation is increasing values, we have , so , a contradiction.
Using this result, the proof for the complexity bound is the same as the proof of Theorem 3.10 using the recursive formulation. Let be the total number of pivots performed by Algorithm 5 on input for an order on pairs.
Now define where the supremum is taken over all games , pretotal orders and total orders extending such that is of size at most . The expectation is taken over all possible uniform choices for .
Then by Lemma 5.38, will satisfy Lemma 3.13, hence the claimed bound of Th. 4.29 by Lemma 3.17 since the depth of the recursive tree is at most .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Daniel Andersson, Kristoffer Arnsfelt Hansen, Peter Bro Miltersen, and Troels Bjerre Sørensen. Deterministic graphical games revisited. In Conference on Computability in Europe , pages 1–10. Springer, 2008.
- 2[2] Daniel Andersson and Peter Bro Miltersen. The complexity of solving stochastic games on graphs. In International Symposium on Algorithms and Computation , pages 112–121. Springer, 2009.
- 3[3] David Auger, Pierre Coucheney, and Yann Strozecki. Finding optimal strategies of almost acyclic simple stochastic games. In International Conference on Theory and Applications of Models of Computation , pages 67–85. Springer, 2014.
- 4[4] Robert G Bland. New finite pivoting rules for the simplex method. Mathematics of operations Research , 2(2):103–107, 1977.
- 5[5] Cristian S Calude, Sanjay Jain, Bakhadyr Khoussainov, Wei Li, and Frank Stephan. Deciding parity games in quasipolynomial time. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing , pages 252–263. ACM, 2017.
- 6[6] Krishnendu Chatterjee, Luca de Alfaro, and Thomas A Henzinger. Termination criteria for solving concurrent safety and reachability games. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms , pages 197–206. SIAM, 2009.
- 7[7] Krishnendu Chatterjee and Nathanaël Fijalkow. A reduction from parity games to simple stochastic games. In Gand ALF , pages 74–86, 2011.
- 8[8] Taolue Chen, Vojtěch Forejt, Marta Kwiatkowska, David Parker, and Aistis Simaitis. Automatic verification of competitive stochastic systems. Formal Methods in System Design , 43(1):61–92, 2013.
