Computer aided synthesis: a game theoretic approach
V\'eronique Bruy\`ere

TL;DR
This paper introduces game theory concepts and methods applied to computer aided synthesis, covering classical results on two-player and multi-player games, with connections to automata theory and solution approaches.
Contribution
It provides a comprehensive overview of game theoretic approaches in synthesis, including classical results and solution strategies, with illustrative examples and proof intuitions.
Findings
Classical results on two-player zero-sum games
Extensions to multi-player non-zero-sum games
Connections between one-player games and automata theory
Abstract
In this invited contribution, we propose a comprehensive introduction to game theory applied in computer aided synthesis. In this context, we give some classical results on two-player zero-sum games and then on multi-player non zero-sum games. The simple case of one-player games is strongly related to automata theory on infinite words. All along the article, we focus on general approaches to solve the studied problems, and we provide several illustrative examples as well as intuitions on the proofs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Logic, programming, and type systems · semigroups and automata theory
11institutetext: Computer Science Department, University of Mons
20 Place du Parc, B-7000-Mons, Belgium
11email: [email protected]
Véronique Bruyère
Computer aided synthesis:
a game-theoretic approach
Véronique Bruyère
Abstract
In this invited contribution, we propose a comprehensive introduction to game theory applied in computer aided synthesis. In this context, we give some classical results on two-player zero-sum games and then on multi-player non zero-sum games. The simple case of one-player games is strongly related to automata theory on infinite words. All along the article, we focus on general approaches to solve the studied problems, and we provide several illustrative examples as well as intuitions on the proofs.
Keywords:
Games played on graphs, Boolean objective, quantitative objective, winning strategy, Nash equilibrium, synthesis.
1 Introduction
Game theory is a well-developed branch of mathematics that is applied to various domains like economics, biology, computer science, etc. It is the study of mathematical models of interaction and conflict between individuals and the understanding of their decisions assuming that they are rational [54, 70].
The last decades have seen a lot of research on algorithmic questions in game theory motivated by problems from computer aided synthesis. One important line of research is concerned with reactive systems that must continuously react to the uncontrollable events produced by the environment in which they evolve. A controller of a reactive system indicates which actions it has to perform to satisfy a certain objective against any behavior of the environment. An example in air traffic management is the autopilot that controls the speed of the plane, but have no control on the weather conditions. Such a situation can be modeled by a two-player game played on a graph: the system and the environment are the two players, the vertices of the graph model the possible configurations, the infinite paths in the graph model all the continuous interactions between the system and the environment. In this game, the system wants to achieve a certain objective while the environment tries to prevent it to do so. The objectives of the two players are thus antagonistic and we speak of zero-sum games. In this framework, checking whether the system is able to achieve its objective reduces to the existence of a winning strategy in the corresponding game, and building a controller reduces to computing such a strategy [38]. Whether such a controller can be automatically designed from the objective is known as the synthesis problem.
Another, more recent, line of research is concerned with the modelization and the study of complex systems. Instead of the simple situation of a system embedded in a hostile environment, we are faced with systems/environments formed of several components each of them with their own objectives that are not necessarily conflicting. Imagine the situation of several users behind their computers on a shared network. In this case, we use the model of multi-player non zero-sum games played on graphs: the components are the different players, each of them aiming at satisfying his objective. In this context, the synthesis problem is a little different: winning strategies are no longer appropriate and are replaced by the concept of equilibrium, that is, a strategy profile where no player has an incentive to deviate [39]. Different kinds of equilibria have been investigated among which the famous notion of Nash equilibrium [53].
A lot of study has been done about Boolean objectives, in particular about the class of -regular objectives, like avoiding a deadlock, always granting a request, etc [38]. An infinite path in the game graph is either winning or losing depending on whether the objective is satisfied or not. To allow richer objectives, such as minimizing the energy consumption or guaranteeing a limited response time to a request, existing models have been enriched with quantitative aspects in a way to associate a payoff (or a cost) to all paths in the game graph [19]. In this setting, we speak of quantitative objectives, and a classical decision problem in two-player zero-sum games is whether there exists a winning strategy for the system that ensures a payoff satisfying some given constraints no matter how the environment behaves. For instance we would like an energy consumption lying within a certain given interval. The same kind of question is also considered for multi-player non zero-sum games, that is, whether there exists an equilibrium such that the payoff of each player satisfies the constraints.
Decidability of those problems is not enough. Indeed in case of positive answer, it is important to know the exact complexity class of the problem and how complex are the strategies used to solve it. Given past interactions between the players, a strategy for a player indicates the next action he has to perform. The amount of memory on those past interactions is one of the ways to express the complexity of the strategy. The simplest strategies are those that require no memory at all. When all these characteristics are known and indicate practical applicability of the models, the final step is the implementation of the solving strategies into a program (like for instance a controller for a reactive system) by using adequate data structures and possibly heuristics.
In this article, we propose a comprehensive introduction to classical algorithmic solutions to the synthesis problem for two-player zero-sum games and for multi-player non zero-sum games. A complementary survey can be found in [9], and detailed expositions in the case of Boolean objectives are provided in [38, 39]. We study the existence of winning strategies (in two-player zero-sum games) and equilibria (in multi-player non zero-sum games) satisfying some given constraints, in particular the complexity class of the decision problem and the memory required for the related strategies. We provide several illustrative examples as well as intuitions on some proofs. We do not intend to present an exhaustive survey, but rather focus on some lines of research, with an emphasis on general approaches. In particular, we only consider (i) turned-based (and not concurrent) games such that the players choose their actions in a turned-based way (and not concurrently), (ii) deterministic (and not stochastic) games such that their edges are deterministic and not labeled by probabilities (iii) pure (and not randomized) strategies such that the next action is chosen in a deterministic way (and not according to a probability distribution).
Our approach is as follows. We begin with a general definition of game that includes the class of games with Boolean objectives and the class of games with quantitative objectives. For two-player zero-sum games, we present a criterium [36] that implies, for several large families of games, the existence of memoryless winning strategies ensuring a payoff satisfying some given constraints. For non zero-sum multi-player games, we present a characterization of plays (used for instance in [14, 65]) that are the outcome of a Nash equilibrium. The existence of Nash equilibrium in many different families of games is derived from this characterization, as well as results on the existence of a Nash equilibrium satisfying some constraints. We also present two other well-studied equilibria: the secure equilibria [23] and the subgame perfect equilibria [61]. For the studied decision problems, in addition to the results derived from our general approaches, we provide in this survey an overview of known results for games with Boolean and quantitative objectives.
The article is organized in the following way. In Section 2, we introduce the concepts of game and strategy, we then present the studied decision problems, and we finally recall the Boolean and quantitative objectives that are classically studied. In Section 3 devoted to two-player zero-sum games, we begin with the simple case of one-player games, and show how the decision problems are connected to problems in automata theory and numeration systems. We then present the general criterium mentioned before, and then the solutions to the decision problems for the classes of games with Boolean and quantitative objectives. Finally, we present several recent extensions of those classes of games, where for instance the single objective is replaced by a Boolean intersection of several objectives. The case of multi-player non zero-sum games is investigated in Section 4 by starting with the characterization of outcomes of Nash equilibrium. Derived results on the existence of Nash equilibrium (under some given constraints) are then detailed, followed by a study of other kinds of equilibria like secure and subgame perfect equilibria. We provide a short conclusion in Section 5.
2 Terminology and studied problems
We consider multi-player turn-based games played on finite directed graphs. The set of vertices are partitioned among the different players. A play is an infinite sequence of vertices obtained by moving an imaginary pebble from vertex to vertex according to existing edges. The owner of the current vertex decides what is the next move of the pebble according to some strategy. Each player follows a strategy in a way to achieve a certain objective. This objective depends on a preference relation that the player has on the payoffs assigned to plays. In this section, we introduce all these notions and state the problems studied in this article.
2.1 Preliminaries
2.1.1 Games
We begin with the notions of arena and game.
Definition 1
An arena is a tuple where:
- •
is a finite set of players,
- •
is a finite set of vertices and is a set of edges, such that each vertex has at least one outgoing edge111This condition guarantees that there is no deadlock. It can be assumed w.l.o.g. for all the problems considered in this article.,
- •
is a partition of , where is the set of vertices owned222We also say that player controls the vertices of . by player .
A play is an infinite sequence of vertices such that for all . Histories are finite sequences defined in the same way. We often use notation to mention the last vertex of the history. The set of plays is denoted by and the set of non empty histories (resp. ending with a vertex in ) by (resp. by ). A prefix (resp. suffix) of a play is a finite sequence (resp. infinite sequence ). We often use notation for a play of which history is prefix. Given a play , we denote by the set of vertices visited infinitely often by . We say that is a lasso if it is equal to with being two histories. This lasso is called simple if has no repeated vertices.
Definition 2
A game is an arena such that each player has:
- •
a payoff function where is a set of payoffs,
- •
a preference relation on his set of payoffs.
A preference relation is a strict total order333that is, an irreflexive, transitive and total binary relation.. It allows player to compare two plays with respect to their payoffs: means that player prefers to . Given , we write when or ; notice that iff since is total.
A payoff function is prefix-independent if for all . It is prefix-linear if for all ,
[TABLE]
Any prefix-independent function is prefix-linear.
When an initial vertex is fixed, we call an initialized game. In this case, plays and histories are supposed to start in , and we then use notations , , and (instead of , , and ).
Example 1
Consider the initialized two-player game in Figure 1 such that player (resp. player ) controls vertices (resp. vertex ).444In all examples of this article, circle (resp. square) vertices are controlled by player (resp. player ). Both players use the same set of payoffs equal to , and the same payoff function that is prefix-independent: , , and . The preference relation for player (resp. player ) is (resp. ).
2.1.2 Strategies
Let be an initialized game. A strategy for player in is a function assigning to each history a vertex such that . Thus is the next vertex chosen by player (that controls vertex ) after history has been played. A play is consistent with if for all such that .
A strategy for player is positional if it only depends on the last vertex of the history, i.e., for all . More generally, it is finite-memory if needs only a finite information out of the history . This is possible with a finite-state machine that keeps track of histories of plays. The strategy chooses the next vertex depending on the current state of the machine and the current vertex in the game.555This informal definition is enough for this survey. See for instance [38] for a definition. The previous definition of positional strategy for player is given for an initialized game . We call it uniform if it is defined for all (instead of ), that is, when is a positional strategy in all initialized games , .
A strategy profile is a tuple of strategies, where each is a strategy of player . It is called positional (resp. uniform, finite-memory) if all , , are positional (resp. uniform, finite-memory). Given an initial vertex , such a strategy profile determines a unique play of that is consistent with all strategies . This play is called the outcome of in and is denoted by .
Example 1 (continued)
An example of strategy profile in is the following one:
- •
the positional strategy for player is defined such that for all ,
- •
the finite-memory strategy for player is defined such that and for all .666As player can only loop on vertices and , we do not formally define on histories ending with or . Hence player chooses to move to (resp. to ) at the first visit (resp. next visits) to . The needed memory is whether the current history has visited once or more time.
The outcome is equal to with payoff .
2.2 Studied problems
In this paper, we want to study two problems. In the first problem, one designated player, say player , wants to apply a strategy that guarantees certain constraints on the payoffs of the plays (with respect to his preference relation) against any strategy of the other players. The other players can thus be considered as one player, say player , being the opponent of player . This is the class of so-called two-player zero-sum games.
Problem 1
Let be an initialized two-player zero-sum game and be two bounds. Decide whether player has a strategy such that (resp. ) for all plays consistent with .777This problem is focused on Player , the payoff function and preference relation of Player do not matter.
Case is called the threshold problem whereas case is called the constraint problem. When a strategy as required in Problem 1 exists, it is called winning and a play consistent with is also called winning; we also say that player can ensure a payoff such that (resp. ). When this problem is decidable, we are interested in finding its complexity class and the simplest winning strategies , like positional or finite-memory ones when they exist.
In a two-player zero-sum game , the opposition between player and player is most often described in terms of objectives. An objective for player is a subset of , here the set of plays such that (resp. ). Player 1 wants to ensure a play in against any strategy of player . As an opponent, player wants wants to avoid plays in , that is, to ensure the opposite objective . We say that the game with objective is determined if for each initial vertex , either player 1 has a winning strategy to ensure in or player 2 has a winning strategy to ensure . Martin’s theorem [51] states that every two-player zero-sum game with Borel objectives is determined. Nevertheless, it gives no information on which player has a winning strategy and on the shape of such a winning strategy. This motivates studying Problem 1.
Example 2
Let us come back to the game of Figure 1 seen as a two-player zero-sum game (we thus focus on player ). In , player has a winning strategy for the threshold problem with , that is, for the objective : take the positional strategy such that . However he has no winning strategy for the threshold problem with . Indeed with the positional strategy such that , player has a winning strategy for the opposite objective since he can ensure a payoff equal to or .
In the second problem studied in this article, we come back to multi-player games where each player has his own payoff function and preference relation. Here, the players are not necessarily antagonistic: this is the class of so-called multi-player non zero-sum games. Instead of looking for a strategy ensuring a certain objective for one designated player, we are now interested in strategy profiles, called solution profiles, that provide payoffs satisfactory to all players with respect to their own objectives. A classical example of solution profile is the notion of Nash equilibrium (NE) [53]. Informally, a strategy profile is an NE if no player has an incentive to deviate (with respect to his preference relation) when the other players stick to their own strategies. In other words, an NE can be seen as a contract that makes every player satisfied in the sense that nobody wants to break the contract if the others follow it.
Definition 3
Given an initialized game , a strategy profile is a Nash equilibrium if for all players and all strategies of player .
In this definition, notation means the strategy profile such that all players stick to their own strategy except player who shifts from strategy to strategy . We say that is a deviating strategy from . When , is called a profitable deviation for player with respect to .
Example -1 (continued)
Let us reconsider the non zero-sum game of Figure 1 and the strategy profile given previously in (, for all , and for all ). This strategy profile is an NE with outcome . Indeed, player has no incentive to deviate since the payoff of is the best possible with respect to . If player uses the deviating strategy from such that , then the resulting outcome has a less preferable payoff for him since . So player has no profitable deviation.
Other kinds of solution profiles will be studied in Section 4.
Problem 2
Let be an initialized multi-player non zero-sum game and be two tuples of bounds. Decide whether there exists a solution profile such that (resp. ) for all players .
Similarly to Problem 1, the two cases are respectively called threshold problem and constraint problem, and we want to compute the complexity class and the simplest solution profiles in case of decidability.
In Sections 3 and 4, we present some known results about solutions to Problems 1 and 2 respectively with an emphasis on general approaches. Before, we end Section 2 with a list of payoff functions that are classically studied.
2.3 Classical payoff functions
In the classes of games that are classically studied, each player uses a real-valued payoff function and a preference relation equal to the usual ordering on . Hence, player prefers to maximize the payoff of a play .888Alternatively, can be the ordering meaning that player prefers to minimize the payoff of a play. In this classical setting, we focus on two particular subclasses: the Boolean payoff functions and the quantitative payoff functions.
Boolean payoff functions
A particular subclass of games are those equipped with Boolean functions , for all , where payoff (resp. payoff [math]) means that the play is the most (resp. the less) preferred by player . Particularly interesting related objectives are , . Classical such objectives are -regular objectives like the following ones [38, 39, 55].
Definition 4
- •
Let ,
- –
Reachability : visits a vertex of at least once,
- –
Safety: visits no vertex of ,
- –
Büchi: ,
- –
Co-Büchi: .
- •
Let be a coloring of the vertices by integers,
- –
Parity: the maximum color seen infinitely often along is even.
- •
Let be a family of pairs of sets ,
- –
Rabin: , , such that and ,
- –
Streett: , , or .
- •
Let be a family of subsets of vertices,
- –
Muller999A colored variant of Muller objective is defined from a coloring of the vertices: the family is composed of subsets of (instead of ) and [39]. See [42] for several variants of Muller games.: .
Notice that reachability and safety (resp. Büchi and co-Büchi, Rabin and Streett) are dual objectives. The complement of a parity (resp. Muller) objective is again a parity (resp. Muller) objective: from the coloring function , define the new function such that for all (resp. from the family , define the new family ). A Büchi (resp. co-Büchi) objective is a particular case of a parity objective: assign color to vertices of and to vertices of (resp. color to and [math] to ). Similarly, one can easily prove that a parity objective is both a Rabin and a Streett objective which are themselves a Muller objective [38].
In the previous definition, the payoff function is prefix-independent in each case except for reachability and safety where only condition (1) of prefix-linearity is satisfied.
Example 3
Suppose that in the game of Figure 1, player wants to achieve the Büchi objective with whereas player wants to achieve the Muller objective with . Then the play has payoff , that is a payoff 0 for player and a payoff for player .
Quantitative payoff functions
Classical quantitative payoff functions are defined from a weight function as follows [19] (each edge of the game is thus labeled by a -tuple of weights).
Definition 5
Let be a weight function and be a rational discount factor. Then is defined as one among the following payoff functions: let ,
- •
Supremum: ,
- •
Infimum: ,
- •
Limsup: ,
- •
Liminf: ,
- •
Mean-payoff : ,
- •
Mean-payoff : ,
- •
Discounted sum: .
Some of these payoff functions provide natural generalizations of the previous -regular objectives. Indeed the supremum (resp. infimum, limsup, liminf) function is a quantitative generalization of the reachability (resp. safety, Büchi, co-Büchi) objective. The mean-payoff and discounted sum functions are much studied in classical game theory [33].
There are two variants of mean-payoff functions because the limit may not exist. Nevertheless in case of a lasso , both payoffs and coincide and are equal to the average weight of the cycle (with respect to the weight function ).
In Definition 5, the payoff function is prefix-independent in limsup, liminf and mean-payoff cases, prefix-linear in discounted sum case, and satisfies condition (1) of prefix-linearity in supremum and infimum cases.
Example 4
We equip the game of Figure 1 with two weight functions , leading to the game of Figure 2. Suppose that and . The preferences of the players with respect to plays and are opposed since for player , and for player .
In the sequel, games with the Boolean payoff functions of Definition 4 are called Boolean games. Similarly games with the quantitative payoff functions of Definition 5 are called quantitative games. We also speak about reachability game, supremum game, etc, when we want to refer to a game where all the players use the same type of payoff function. The complexity results mentioned later depend on the number of vertices, edges and players, as well as on the number of colors (resp. pairs, elements of ) for parity (resp. Rabin/Streett, Muller) games, and on numerical rational values (of weights, discount factor, and bounds) given in binary for quantitative games.
3 Two-player zero-sum games
In two-player zero-sum games, players and have opposite objectives. This class of games has been much studied. In particular solutions to Problem 1 are well established for Boolean games and quantitative games as introduced in Section 2.3. Before presenting them, we begin with the simplest situation of games played by a unique player and we show that the problems studied in this article are connected to problems in automata theory and numeration systems.
3.1 One-player games
In one-player games, player 1 has no opponent, he is the only player to choose the next vertex at any moment of a play. In other words, a strategy for player is nothing else than a play in the game. The statement of Problem 1 thus simplifies as follows:101010In Section 3.1, we omit index everywhere since player is the unique player of the game.
Problem 3
Let be an initialized one-player game. Let be two bounds. Decide whether there exists a play such that (resp. )?
3.1.1 Boolean games
For Boolean games, this problem is interesting only with bounds . Indeed recall that the payoff function is Boolean and that player prefers plays such that . This is the classical well-known non emptiness problem for automata [55]. For instance, Problem 3 for one-player reachability (resp. Büchi) games with is the non emptiness problem for automata accepting finite words (resp. Büchi automata accepting infinite words).
Theorem 3.1
Let be an initialized one-player Boolean game. Then Problem 3 (with ) is decidable in polynomial time with positional winning strategies, except for Streett and Muller games where finite-memory strategies are necessary and sufficient.
Let us comment this theorem. Notice that a winning strategy for player 1 that is finite-memory (resp. positional) means that the corresponding winning play , or in terms of automata the accepted word, is a (resp. simple) lasso. It is well-known that positional strategies are sufficient for Büchi objectives. This also happens for the other objectives except for Streett and Muller objectives (we will discuss this point in more details in Section 3.2, see Theorem 3.5). Example 5 illustrates that finite-memory strategies are necessary for Streett and Muller games. In cases where positional strategies are sufficient, an algorithm for Problem 3 has thus to concentrate on the existence of winning simple lassos, which can be easily done in polynomial time. The case of Streett and Muller games can also be solved in polynomial time [31, 41]. Problem 3 is NL-complete for reachability and Büchi games [45, 67] as well as for safety, co-Büchi, Rabin, parity, and Muller games, and it is P-complete for Streett games [31, 60].
Example 5
Consider the initialized one-player game of Figure 3 with . For the Muller objective with (or the Streett objective with the two pairs such that , and , ), a winning play cannot be a simple lasso as it has to alternate between and .
3.1.2 Quantitative games
Let us turn to quantitative games. The existence of plays with in one-player quantitative games (threshold problem) have been studied in [19].
Theorem 3.2
[19]** Let be an initialized one-player quantitative game, and be a rational threshold. Then deciding whether there exists a play such that is solvable in polynomial time with positional strategies.
Let us comment this theorem. Dealing with functions , , , and is equivalent to respectively consider reachability, safety, Büchi, and co-Büchi objectives (studied in Theorem 3.1). For instance, satisfying is equivalent to visiting an edge with a weight along . For functions , , and , once one knows that positional strategies are sufficient (we will discuss this point in more details in Section 3.2), the problem again reduces to the existence of a simple lasso with maximum payoff . In case of mean-payoff function, recall that both payoffs , coincide and are equal to the average weight of the cycle . A polynomial algorithm is proposed in [47] to compute a cycle in a weighted graph with maximum average weight. The case of function is polynomially solved by a linear programming approach in [2].
We now discuss the existence of a play such that , given two rational bounds (constraint problem). The problem is more involved, in particular it is currently unsolved for function .
Theorem 3.3
[43, 66]** Let be an initialized one-player quantitative (except discounted sum) game, and be two rational bounds. Then deciding whether there exists a play such that is solvable in polynomial time. Positional strategies are sufficient for supremum, infimum, limsup, and liminf games, whereas finite-memory is necessary and sufficient for mean-payoff and games.
Let us comment this theorem. If we focus on function , looking for a play such that reduces to the non emptiness problem for Rabin automata (studied in Theorem 3.1). Indeed the required play is such that at least one weight seen infinitely often along is and none of them is . A similar approach exists for functions , and . Whereas positional winning strategies are sufficient in all these cases, finite-memory is needed for mean-payoff functions as indicated in Example 6. Since finite-memory strategies are sufficient [43], the problem in both cases , reduces to the existence of a lasso satisfying the constraints. This can be checked in polynomial time by solving a linear program [66].
Example 6
Consider the game of Figure 3 equipped with the weight function that labels the two left edges by [math] and the two right edges by . A winning play for cannot be a simple lasso (with payoff either [math] or ). However the non simple lasso is winning.
Concerning function , Problem 3 is open. It is closely related to the following open problem, called target discounted-sum problem in [6].
Problem 4
Given three rational numbers and , and a rational discount factor , does there exist an infinite sequence such that is equal to ?
The authors of [6] show that Problem 4 is related to several open questions in mathematics and computer science. In particular it is related to numeration systems and more precisely to -representations of real numbers [4, 50]. Given a real number (the base) and a finite alphabet (the set of digits), a -representation of a real number is an infinite sequence , also written , such that . A well-known result [58] is that every has a -representation using . It follows that Problem 4 asks whether has a -representation (with ) using and . This problem is therefore decidable when and . Indeed using the result of [58], either and it has no -representation , or and it has such a -representation. Other partial results to Problem 4 can be found in [6].
3.2 Two-player games
We now turn to two-player zero-sum games. In Problem 1, the objective of player 1 is the set of plays such that (resp. ), whereas player 2 has the opposite objective . Examples of the threshold problem are the following ones: in a reachability game, player 1 aims at reaching some target set of vertices whereas player 2 tries to prevent him from reaching it; in a limsup game, player 1 aims at maximize the payoff of the play (in a way to be ) whereas player 2 tries to minimize it. Recall that by Martin’s theorem, every two-player zero-sum games with Borel objectives is determined. This large class of games includes the objectives of player in Problem 1 for the Boolean and quantitative games introduced in Section 2.3. A lot of research has been developed to solve Problem 1 that we present in this section. In Sections 3.2 and 3.3, as the objectives and of players and only depend on , , and , we simplify the used notation by omitting index .
3.2.1 Criterium for uniform optimal strategies
We begin by studying the winning strategies that player can use for the threshold problem in Problem 1. This is related to the notion of value and optimal strategy.
Definition 6
Let be an initialized two-player zero-sum game. If there exists such that
- •
player has a strategy such that for all plays in consistent with , and
- •
player has a strategy such that for all plays in consistent with ,
then is the value of and (resp. ) is an optimal strategy for player (resp. player ).
Intuitively, is the highest threshold for which player can ensure (with an optimal strategy) a payoff such that . In this definition, the antagonistic player behaves in the opposite way. When the value exists and is computable, the threshold problem is easily solved: we just check whether the given threshold satisfies . Moreover both players can limit themselves to use optimal strategies, that is, if player has a winning strategy (resp. no winning strategy) for the threshold problem, then player (resp. player ) can use an optimal strategy as winning strategy (resp. for the opposite objective).
Example -8 (continued)
Let us come back to the two-player zero-sum game of Example 2. Recall that in , player has a winning strategy for the threshold problem with but not with , meaning that . Indeed, one can check that and , and that both players have optimal strategies that are positional, and even more uniform. The values are indicated under the vertices in Figure 4, and the two uniform optimal strategies are given as thick edges.
We will see later in this section that Boolean and quantitative games often have uniform optimal strategies (see Theorems 3.5 and 3.6). In [36], the authors propose a unified approach to all these results: they give a general criterium on the payoff function that guarantees uniform optimal strategies for both players.
Theorem 3.4
[36]111111The hypotheses of this theorem are those given in the full version of [36] available at http://www.labri.fr/perso/gimbert/ Let be a two-player zero-sum game with a preference relation on such that each subset of has an infimum and a supremum. If the payoff function is fairly mixing, that is,
, if then , 2. 2.
, , 3. 3.
,
**
**
,
then both players have uniform optimal strategies.
Let us comment this theorem. The first condition is condition (1) of prefix-linearity. If is prefix-independent, then the first and the second conditions are trivially satisfied. The third condition is concerned with shuffles of histories. Let us apply this theorem to quantitative games, for instance to function (see Definition 5). This function is prefix-independent and satisfies the third condition since . One can check that the payoff functions of all quantitative games are fairly mixing, as well as the payoff functions of the Boolean games with reachability, safety, Büchi, co-Büchi, and parity objectives [36] (but not with Streett and Muller objectives).
The proof121212Theorem 3.4 is given in [36] for real-valued payoff functions and the usual ordering , but its proof is easily generalized to the statement given here. of Theorem 3.4 is simple and elegant; it is by induction on . If then there is exactly one outgoing edge for each vertex and thus both players have a unique possible strategy that is therefore uniform and optimal. Suppose that and let us focus on player (a symmetric argument is used for player ). If all vertices have only one outgoing edge, then player has a unique strategy, and it is uniform and optimal. Suppose that some has at least two outgoing edges. We partition this set of edges into two non empty subsets and . From we define two smaller games and with the same vertices and edges except that the set of outgoing edges from is restricted to in and to in . By induction hypothesis, has a value in and in , and both players have uniform optimal strategies, respectively in and in . W.l.o.g. , we then choose as optimal strategy for player in and for all , we take their value in as their value in . Clearly is optimal and uniform in . The rest of the proof consists in defining a strategy for player (from and ) that is optimal in . This is possible thanks to the three conditions of Theorem 3.4 applied on plays decomposed according to occurrences of .
Further results can be found in [37]: a characterization of payoff functions is given guaranteeing the existence of uniform optimal strategies for both players. From this characterization, it follows that if both players have uniform optimal strategies when playing solitary in one-player games, then they also have uniform optimal strategies in zero-sum two-player games.
3.2.2 Boolean games
Let us now focus on Boolean games. As for one-player games, we limit the study of Problem 1 (threshold and constraint problems) to the only interesting case . The following theorem for two-player games is the counterpart of Theorem 3.1 for one-player games.
Theorem 3.5
Let be an initialized two-player zero-sum Boolean game. Then Problem 1 (with ) is
- •
P*-complete with uniform winning strategies for reachability, safety, Büchi, and co-Büchi objectives [3, 30, 38, 44],*
- •
P*-complete with finite-memory winning strategies for Muller131313It is PSPACE-complete for the colored variant of Muller objective [42, 52]. objective [41],*
- •
NP*-complete with uniform winning strategies for Rabin objective [29, 30],*
- •
co-NP*-complete with finite-memory winning strategies for Streett objective [16, 30],*
- •
in NP co-NP with uniform winning strategies for parity objectives **[30]**.
Let us comment this theorem. The existence of uniform winning strategies (for all objectives except Rabin and Muller objectives) was previously mentioned as a consequence of Theorem 3.4 [36]. Notice that here a value is equivalent to say that player has a winning strategy for Problem 1. In case player has no winning strategy (), it follows that player has a winning strategy for the opposite objective by Martin’s theorem. Hence Theorem 3.5 also gives information for player by considering the opposite objective. In [48], the author gives general conditions on Boolean objectives that guarantee the existence of a uniform winning strategy for one of the players (and not necessarily for both players). This includes the case of Rabin games where player has a uniform winning strategy (whereas player needs to use a finite-memory strategy to win the opposite Streett objective).
Problem 1 is decidable in time for reachability and safety games [38], and the current best algorithm for Büchi and co-Büchi games is in time [22]. For Muller games with , the complexity is in time [41], whereas for Rabin and Streett games with pairs , it is in time [56]. Concerning parity games, the complexity class of Problem 1 is refined to UP co-UP in [46] and a major open problem is whether it can be solved in polynomial time. Very recently, a breakthrough quasi-polynomial time algorithm has been proposed in [17] for parity games.
3.2.3 Quantitative games
Let us turn to quantitative games for which we first give results for the threshold problem, and then for the constraint problem. The following theorem provides the known results to the threshold problem. It describes the simplest form of winning strategies for player (resp. player ) when he has a winning strategy for this problem (resp. for ensuring the opposite objective when player has no winning strategy).
Theorem 3.6
Let be an initialized two-player zero-sum quantitative game, and be a rational bound. Then the threshold problem (in Problem 1) is
- •
P*-complete for supremum, infimum, limsup, and liminf games with uniform winning strategies for both players,*
- •
in NP co-NP for mean-payoff and discounted sum games with uniform winning strategies for both players **[71]**.
Let us comment this theorem. We already know the existence of uniform winning strategies from Theorem 3.4 [36]. The P-completeness for supremum, infimum, limsup, and liminf games follows from the P-completeness for reachability, safety, Büchi, and co-Büchi games in Theorem 3.5. Parity games are polynomially reducible to mean-payoff games [46] which are themselves polynomially reducible to discounted sum games [71]. For these three classes of games, from the existence of uniform winning strategies, we get a threshold problem in NP as follows: guess a uniform strategy for player (by choosing one outgoing edge for all ), fix this strategy in the game to get a one-player game , apply the related polynomial time algorithm of Theorems 3.1 or 3.2 (from the point of view of player who controls ). The co-NP membership is symmetrically obtained with player .
Concerning the constraint problem, recall that it is more complex already for one-player games (see Section 3.1) with no known solution for discounted sum games (see Problem 4).
Theorem 3.7
Let be an initialized two-player zero-sum quantitative (except discounted sum) game, and be rational bounds. Then the constraint problem (in Problem 1) is
- •
P*-complete for supremum, infimum, limsup, and liminf games with uniform winning strategies for both players [13, 43],*
- •
in NP co-NP for mean-payoff games with finite-memory (resp. uniform) winning strategies for player (resp. player ) **[43]**.
Discounted sum games are studied with bounds such that (to avoid the case of Problem 4) in [43] where it is proved that the constraint problem is PSPACE-complete with finite-memory winning strategies for both players.
3.3 Variants of preferences
Several extensions141414The reader who prefers to know classical solutions to Problem 2 for multi-player non zero-sum games can skip this section and go directly to Section 4. of two-player zero-sum Boolean and quantitative games have been studied in the literature, by using preferences that are irreflexive and transitive but not necessarily total, or more generally by using preorders that are reflexive and transitive binary relations (hence, is not supposed to be total and one can have and such that ).
Such variants naturally appear when we study intersection of objectives instead of a single objective as in Section 2.3:
- •
Intersection of homogeneous objectives. For instance player has reachability objectives (instead of just one), and he wants to visit all the sets .
- •
Intersection of heterogeneous objectives. In this more general case, player has several objectives not necessarily of the same type. Let us imagine a situation where he has two quantitative objectives depending on two weight functions on the graph, like ensuring a threshold for the liminf of weights with respect to the first weight function and another threshold for the mean-payoff with respect to the second weight function.
In this context, for player , we consider a tuple of payoff functions and a tuple of weight functions (instead of a single payoff function defined from a single weight function ) such that each function is defined from .151515This tuple of payoff functions is used by player contrarily to Definition 2 where function is used by player for all . Tuples of payoffs and are then compared using the usual ordering on tuples of reals: iff for all components and there exists such that (the preference relation is not total). Let us mention some results first for quantitative objectives and then for Boolean objectives.
3.3.1 Combination of quantitative objectives
The threshold problem takes the following form: given a tuple of rational thresholds, decide whether player has a strategy that ensures a payoff such that for all plays consistent with .
Theorem 3.8
[69]** Let be an initialized two-player zero-sum game with homogeneous intersections of mean-payoff objectives. Then the threshold problem (in Problem 1) is
- •
in NP co-NP for functions ,
- •
is co-NP-complete for functions .
In both cases, infinite memory is required for winning strategies of player whereas uniform winning strategies are sufficient for player .
This theorem indicates different behaviors for the functions and . This is illustrated with the example of the initialized one-player game depicted in Figure 5, where player wants to ensure the intersection of two homogeneous objectives. It is shown in [69] that for a pair of functions , player can ensure a threshold , and that for a pair of functions , he can ensure a threshold (which is impossible with ). In both cases infinite memory is necessary. Indeed recall that with a finite-memory strategy the produced play is a lasso such that is the average weight of the cycle . Here this average weight has the form , with and . Clearly showing that player is losing for threshold with finite-memory strategies.
In [68], the author studies objectives equal to Boolean combinations of inequalities , with and : deciding whether player has a winning strategy in becomes undecidable. However, this problem remains decidable and is EXPTIME-complete for CNF/DNF Boolean combinations of functions taken among [13], where is an interesting window variant of mean-payoff introduced in [20]. The threshold problem is P-complete (resp. EXPTIME-complete) for a single objective (resp. an intersection of objectives) [20]. Recall that it is in NP co-NP for a single function or (see Theorem 3.6).
3.3.2 Combination of Boolean objectives
Concerning one-player games, Boolean combinations of Büchi and co-Büchi objectives are introduced in [31] as a generalization of Rabin and Streett objectives. It is proved that the non emptiness problem for this class of automata is NP-complete (for a comparison see Theorem 3.1). Concerning two-player games, the intersection of homogeneous objectives is simple for safety, co-Büchi, Streett, and Muller cases. Indeed the intersection of safety (resp. co-Büchi, Streett, Muller) objectives is again a safety (resp. co-Büchi, Streett, Muller) objective. In the other cases, we have the following results to be compared with those of Theorem 3.5.
Theorem 3.9
Let be an initialized two-player zero-sum game with an intersection of homogeneous objectives. Then Problem 1 is
- •
PSPACE*-complete for reachability objectives with finite-memory winning strategies for both players [32],*
- •
P*-complete for Büchi objectives with finite-memory (resp. uniform) winning strategies for player (resp. player ) [21],*
- •
co-NP*-complete for parity objectives with finite-memory (resp. uniform) winning strategies for player (resp. player ) [24],*
- •
PSPACE*-complete for Rabin objectives with finite-memory winning strategies for both players.161616We found no reference for this result. The PSPACE membership (resp. the finite memory of the strategies) follows from [1] (resp. [13]). In [1], games with a union of a Streett objective and a Rabin objective are shown to be PSPACE-hard. It is thus also the case for games with a union of Streett objectives. By Martin’s theorem, it follows that games with an intersection of Rabin objectives are PSPACE-hard.*
Problem 1 is PSPACE-complete for heterogeneous intersections of reachability and Büchi objectives [13] as well as for Boolean combinations of Büchi objectives [1, 42].
3.3.3 Lexicographic and secure preferences
For a tuple of payoff functions defined from a tuple of weight functions for player , let us mention two other natural preference relations .
Definition 7
Let be two tuples of real payoffs.
- •
lexicographic preference: iff there exists such that and for all . That is, player prefers to first maximize the first component, then the second, then the third, etc (see for instance [5]).
- •
secure preference: iff either or {, for all components , and there exists such that }. That is, player prefers to first maximize the first component, and then to minimize all the other components (see for instance [28]).
The lexicographic preference is total whereas the secure preference is total only for pairs (instead of tuples) of payoffs. In the latter case, we get a preference which is close to the lexicographic ordering: player prefers to maximize the first component, and then to minimize the second one. The secure preference is used in the notion of secure equilibrium discussed later in Section 4.3.1.
Theorem 3.10
Let be an initialized two-player zero-sum game.
- •
Suppose that is the lexicographic preference . Then the threshold problem (in Problem 1) for function is in NP co-NP with uniform winning strategies for both players **[5]**.
- •
Suppose that is the secure preference on pairs of payoffs. Then the threshold problem (in Problem 1) is in NP co-NP (resp. P-complete) for functions , , and (resp. for functions , , , and ). Moreover both players have uniform (resp. positional) winning strategies for functions , , , , and (resp. for functions and ) **[14]**.
In this theorem, it is supposed that the components of are all of the same type (for instance they are all limsup functions); and some results stating the existence of uniform winning strategies can be established thanks to Theorem 3.4. Notice that the secure preference is limited to pairs of payoffs in a way to be total, which is a necessary condition when dealing with values. Notice also that the authors in [5] consider liminf average of the weight vector under lexicographic ordering whereas the authors in [14] consider the secure ordering of components where each component is the liminf average value.
The threshold problem is studied in [7] in a general context: the players can use various preorders (like the lexicographic preference, a preorder given by a Boolean circuit, etc), the players play concurrently and not in a turned-based way, and the objectives are Boolean as in Definition 4.
4 Multi-player non zero-sum games
In multi-player non zero-sum games, the different players are not necessarily antagonistic, they have their own payoff functions and preference relations . Each of them follows a strategy , the resulting strategy profile induces a play that should be satisfactory to all players. As explained in Section 2.2 (see Definition 3), a classical solution profile is the notion of NE, where no player has an incentive to deviate when the other players stick to their own strategies. It is proved in [25, 39] that there exists an NE in every initialized multi-player non zero-sum game with Borel Boolean objectives. We go further by presenting in this section additional existence results for quantitative games and some known results for NEs as a solution to Problem 2 (threshold problem and constraint problem). As in Section 3, we focus on general approaches.
4.1 Characterization of outcomes of NE
Given a multi-player non zero-sum game and an initial vertex , we begin by a characterization of plays that are the outcome of an NE in . It will imply the existence of NE in large classes of games (see Corollaries 1 and 2), and will be useful for the study of Problem 2 (see Theorems 4.1 and 4.2). This characterization is related to a family of two-player zero-sum games , one for each , associated with and defined as follows. (i) The game has the same arena as , (ii) the two players are player (player ) and player (player ) formed by the coalition of the other players , (iii) the payoff function of player is equal to and his preference relation is equal to 171717Recall that the payoff function and the preference relation of the second player do not matter in two-player zero-sum games.. For all , when it exists, we denote by the value of vertex in game , and by , the related optimal strategies for players respectively in (see Definition 6).
Proposition 1
Let be a multi-player non zero-sum game such that for all ,
- •
the payoff function is prefix-linear, and
- •
in the game , all vertices has a value.
Then is the outcome of an NE in iff for all and all such that .
The condition of this proposition asks that for all , if vertex is controlled by player , then in the two-player zero-sum game , its value is less preferred or equal to the payoff of the suffix . The proposed characterization appears under various particular forms, for instance in [14, 39, 65]. It is here given under two general conditions already studied in Section 3.2. Recall that almost all the payoff functions considered in Section 2.3 are prefix-linear and that for all the related two-player zero-sum games , the vertices have a value. Notice that when is prefix-independent, condition for all with simplifies in (the maximum exists since is finite).
Example -19 (continued)
An example of NE with outcome was given in Example 1 for the initialized game of Figure 1. Let us verify that satisfies the characterization of Proposition 1. Recall that both players use the same payoff function that is prefix-independent. The values of were computed in Example -8: and . Similarly one can compute the values of : and . One checks that , for .
The proof of Proposition 1 is easy to establish.
Firstly suppose that is the outcome of an NE and that there exist and with such that . Let us show that player has a profitable deviation with respect to in contradiction with being an NE. The strategy consists in playing according to until producing and from in playing according to his optimal strategy (in ). The payoff of the resulting play from is such that by optimality of , and thus . From prefix-linearity of it follows that as required.
Secondly suppose that for all and all such that . We are going to construct an NE by using a well-known method in classical game theory that is used in the proof of the Folk Theorem in repeated games [54]. We define a strategy profile that produces as outcome, and as soon as some player deviates from , say at vertex , all the other players (as a coalition) punish him by playing from the optimal strategy (in ). Let us show that is an NE. Let be a deviating strategy from for player , and let be the outcome of the strategy profile . Consider the longest common prefix of and . Then and by optimality of , we get and thus . From prefix-linearity of it follows that showing that is not a profitable deviation for player .
Notice that in this proof, the first (resp. second) implication only requires condition (2) (resp. (1)) of prefix-linearity of . The next corollary follows from this observation and Proposition 1.
Corollary 1
[27]** Let be a multi-player non zero-sum game such that for all ,
- •
the payoff function satisfies for all , and
- •
each game has uniform optimal strategies for both players.
Then there exists a finite-memory NE in each initialized game .
This corollary is a generalization of a theorem181818In [12], one hypothesis is missing: the required optimal strategies must be uniform. given in [12, 27] for the existence of NEs in games equipped with payoff functions , . The proof of Corollary 1 is as follows. Let us consider the play produced by the players when each player plays according to his optimal strategy in ( for all vertices since it is uniform). By construction, is the outcome of an NE because it satisfies the characterization of Proposition 1. Notice that is a simple lasso since each , , is uniform. Therefore the strategies of the constructed NE are finite-memory with a small memory size bounded by to remember this lasso and the first player who deviates from .
The existence of an NE is also guaranteed in the following corollary that does not require optimal strategies that are uniform, but in counterpart requires payoff functions that are prefix-independent.
Corollary 2
[27]** Let be a multi-player non zero-sum game such that for all ,
- •
the payoff function is prefix-independent, and
- •
each game has (resp. finite-memory) optimal strategies for both players.
Then there exists an (resp. finite-memory) NE in each initialized game .
This is a generalization of a result given in [27] for games equipped with payoff functions , . The proof is as follows: under the hypotheses of Corollary 2, one can show that there exist optimal strategies in , , such that for all plays consistent with , we have . Then as in Corollary 1, we consider the play obtained when each player plays according to his optimal strategy . As each is prefix-independent, satisfies the characterization of Proposition 1.
From Corollaries 1 and 2, it follows that there exists an NE (which can be constructed) in every game of Section 2.3; the case of Boolean (resp. quantitative) game is proved in [25, 39] (resp. in [12, 27]). The existence of an NE in discounted sum games can be obtained in a second way: the function is continuous and all games with real-valued continuous payoff functions always have an NE [35, 40]. Notice that the two previous corollaries allow mixing the types of functions , like for instance associated with a Büchi objective, a limsup function , a mean-payoff function , etc.
Conditions generalizing those of Corollaries 1 and 2 are given in [59] that guarantee the existence of a finite-memory NE. Moreover, for most of the given conditions counterexamples are provided that show that they cannot be dispensed with.
4.2 Solution to Problem 2
In this section we study how to solve Problem 2 for NEs (threshold problem and constraint problem). The characterization given in Proposition 1 provides a general approach to solve this problem. Indeed consider the case of initialized games with prefix-independent payoff functions and such that the vertices of each game , , has a value. Then given two tuples of bounds , we simply have to check whether there exists a play such that for all ,
[TABLE]
Thanks to this general approach or variations based on Proposition 1, Problem 2 is solved for Büchi, co-Büchi, Streett, and parity games in [64], and for the other Boolean games in [26].
Theorem 4.1
[26, 64]** Let be an initialized multi-player non zero-sum Boolean game. Then Problem 2 is
- •
is P-complete for Büchi and Muller191919We found no reference for Muller objectives. A sketch of proof is given in the appendix. Problem 2 is PSPACE-complete for the colored variant of Muller objectives [26].* games,*
- •
NP*-complete for reachability, safety, co-Büchi, parity and Streett games,*
- •
in , and NP-hard, co-NP-hard for Rabin games.
Let us explain the proof of NP membership for parity games and the constraint problem with bounds . As each is a two-player zero-sum parity game, recall that the constraint problem is in NP co-NP with uniform winning strategies for both players (see Theorem 3.5). The required algorithm in NP is as follows. (i) For all , in the game , guess a subset of vertices and two uniform strategies for players respectively (intuitively we guess and ). Check in polynomial time202020Recall our comment after Theorem 3.6. that is a winning strategy for player for the constraint problem in each with and that is a winning strategy for player for the opposite objective in each with . (ii) Then for all , we guess (intuitively we guess such that for the required play ). Construct in polynomial time a one-player game from such that each set of vertices is limited to and the unique player is formed by the coalition of all players . (iii) By (3) it remains to check whether there exists a play in such that for all , and . Recall that the existence of plays satisfying certain constraints in one-player games was studied in Section 3.1, see Theorem 3.1. Here we are faced with the existence of a play in game with an intersection of parity objectives which can be checked in polynomial time by [31].
Problem 2 can be similarly solved for quantitative games.
Theorem 4.2
[26, 64, 65]** Let be an initialized multi-player non zero-sum quantitative (except discounted sum) game. Then Problem 2 is
- •
P*-complete for limsup games,*
- •
NP*-complete for supremum, infimum, liminf, mean-payoff , and mean-payoff games.*
The case of supremum, infimum, limsup and liminf games is equivalent to the case of reachability, safety, Büchi and co-Büchi games presented in Theorem 4.1, whereas the case of mean-payoff games is studied in [65]. The proof of NP membership for mean-payoff games is based on the approach (3), and is similar to the one given above for parity games. The case of discounted sum games is open. Indeed it is proved in [14] that Problem 4 reduces to Problem 2 with the discounted sum function.212121The reduction is given for another kind of solution profile but it also works for NEs.
Problem 2 is studied in [7] in a general context: the players can use various preorders, they play concurrently and not in a turned-based way, and the objectives are Boolean as in Definition 4. The general approach proposed in [7] is different from the one of Proposition 1.
4.3 Other solution profiles
In this section, we present some other solution profiles. Indeed the notion of NE has several drawbacks: (i) Each player is selfish since he is only concerned with his own payoff, and not with the payoff of the other players. (ii) An NE does not take into account the sequential nature of games played on graphs. We illustrate these drawbacks in the following two examples of quantitative game.
Example 7
Consider the two-player quantitative game of Figure 7 such that for . The strategy profile depicted with thick edges is an NE. Notice that player could decide to deviate at by moving to . Indeed he then keeps the same payoff of but also decreases the payoff of player (from to ) which is bad for player . To avoid such a drawback, we will introduce hereafter the concept of secure equilibrium, where each player take cares of his own payoff as well as the payoff of the other players (but in a negative way).
Consider now the game of Figure 7 where the weights of the loops have been modified. The depicted strategy profile is again an NE. Player has no incentive to deviate at due to the threat of player : player will receive a payoff of . Such a threat of player is non credible because in the subgame induced by , at vertex , it is more rational for player to move to to get a payoff of instead of going to where he only receives a payoff of . To avoid such a drawback, we will introduce hereafter the concept of subgame perfect equilibrium that takes into account rational behaviors of the players in all subgames of the initial game.
4.3.1 Secure equilibria
The notion of secure equilibrium (SE) is introduced in [23] for two-player non zero-sum games. The idea of an SE is that no player has an incentive to deviate in the following sense: he will not be able to increase his payoff, and keeping the same payoff he will not be able to decrease the payoff of the other player. An SE can thus be seen as a contract between the two players which strengthens cooperation: if a player chooses another strategy that is not harmful to himself, then this cannot harm the other player if the latter follows the contract.
The definition of an SE is given in the context of games equipped with payoff functions , . It uses the notion of secure preference introduced in Section 3.3 (see Definition 7222222The definition was given for player .). Let us recall the secure preference for player : given , we have iff either or {, for all components , and there exists such that }. Hence player prefers to increase his own payoff, and in case of equality to decrease the payoffs of all the other players. This preference relation is not total except when there are only two players.
The definition of an SE is very close to the one of NE (see Definition 3). The only difference is that it uses the secure preference:
Definition 8
Given an initialized game , a strategy profile is a secure equilibrium if
[TABLE]
for all players and all strategies of player .
Example 6 (continued)
The strategy profile of Figure 7 is not an SE because player has a profitable deviation if at he chooses to move to : .
By definition, every SE is an NE but the converse is false as shown in the previous example. It is proved in [23] that every two-player non zero-sum game with Borel Boolean objectives has an SE; this result is generalized to multi-player games in [28].
Let us turn to quantitative games such that all the players have the same type of payoff function . General hypotheses are provided in [28] that guarantee the existence of an SE in quantitative games, except for functions and . Thanks to Corollary 1 and Theorem 3.10, for two-player232323A restriction to two-player games is necessary to deal with a secure preference that is total. quantitative games (now including functions and ), there exists such an SE that is finite-memory [14]. Moreover, with the same general approach (3) described previously for NEs, Problem 2 is solved as follows for SEs.
Theorem 4.3
[14]** Let be an initialized two-player non zero-sum quantitative (except discounted sum) game. Then Problem 2 for SEs is
- •
P*-complete for supremum, infimum, limsup, and liminf functions,*
- •
in NP co-NP for functions and .
The case of discounted sum function is open since it is proved in [14] that Problem 4 reduces to Problem 2 with . The complexity class of the problem of deciding whether, in an initialized two-player parity game , there exists an SE with payoff respectively equal to , , , and , is studied in [23, 39].
4.3.2 Subgame perfect equilibria
A solution profile that avoids incredible threats by taking into account the sequential nature of games played on graphs is the notion of subgame perfect equilibrium (SPE) [61]. For being an SPE, a strategy profile is not only required to be an NE from the initial vertex but after every possible history of the game.
Before giving the definition of an SPE, we need to introduce the following concepts for an initialized game with payoff functions and preference relations , for all . Given a history , the subgame of is the initialized game with payoff functions , , such that for all plays (the preference relation of player is his preference in ). Given a strategy for player in , the strategy in is defined as for all .
Definition 9
Given an initialized game , a strategy profile is a subgame perfect equilibrium if is an NE in each subgame of with .
Example 4 (continued)
The strategy profile of Figure 7 is not an SPE because in the subgame , player has a profitable deviation with respect to if at he chooses to move to .
By definition, every SPE is an NE but the converse is false as shown in the previous example. A well-known result is the existence of an SPE in every initialized game such that its arena is a tree rooted at 242424In this particular context, plays are finite paths. [49]. The SPE is constructed backwards from the leaves to the initial vertex in the following way. Suppose that the current vertex is controlled by player , and that for each son of one has already constructed an SPE in the subtree rooted at . Then player chooses the edge such that has the best outcome with respect to his preference relation . The resulting strategy profile is an SPE in the subtree rooted at .
It is proved in [63] that there exists an SPE in every multi-player non zero-sum game with Borel Boolean objectives, and that in case of -regular objectives there exists one that is finite-memory. Existence of an SPE also holds for games with continuous real-valued payoff functions [35, 40] (this is also holds when the functions are upper-semicontinuous (resp. lower-semicontinuous) and with finite range [34] (resp. [57])).
For subgame perfect equilibria, we are not aware of a characterization like the one in Proposition 1. Therefore a solution to Problem 2 for SPEs needs a different approach. Few solutions are known: this problem is in EXPTIME for Rabin games [63] and is NP-hard for co-Büchi games [39].
Whereas NEs exist for large classes of games, see Corollaries 1 and 2, SPEs fail to exist even in simple games like the one of Figure 1 [62]. Variants of SPE, weak SPE and very weak SPE, have thus been proposed in [11] as interesting alternatives. In a weak SPE (resp. very weak SPE), a player who deviates from a strategy is allowed to use deviating strategies that differ from on a finite number of histories only (resp. only on the initial vertex). Deviating strategies that only differ on the initial vertex are a well-known notion that for instance appears in the proof of Kuhn’s theorem [49] with the one-step deviation property. By definition, every SPE is a weak SPE, and every weak SPE is a very weak SPE. Weak SPE and very weak SPE are equivalent notions, but this is not true for SPE and weak SPE [11].
The following theorem gives two general conditions such that each of them separately guarantees the existence of a weak SPE.
Theorem 4.4
[15]** Let be a multi-player non zero-sum game such that
- •
either each payoff function , , is prefix-independent,
- •
or each , , has a finite range.
Then there exists a weak SPE in each initialized game .
This theorem has to be compared with Corollary 2 that gives general conditions for the existence of an NE, one of them being prefix-independence of , . This latter condition is here enough to guarantee the existence of a weak SPE (the existence of an SPE is not possible as mentioned before with the game of Figure 1 [62]). It follows from Theorem 4.4 that there exists a weak SPE in all the Boolean and quantitative games of Section 2.3 (except for the case of discounted sum payoff that is neither prefix-independent nor with finite range).
In addition to SEs and (weak) SPEs, other solution profiles have been recently proposed, like Doomsday equilibria in [18], robust equilibria in [8], and equilibria using admissible strategies in [10]. We also refer the reader to the survey [9].
5 Conclusion
In this invited contribution, we gave an overview of classical as well as recent results about the threshold and constraint problems for games played on graphs. Solutions to these problems are winning strategies in case of two-player zero-sum games, and equilibria in case of multi-player non zero-sum games. We tried to present a unified approach through the notion of games equipped with a payoff function and a preference relation for each player, in a way to include classes of Boolean games and quantitative games that are usually studied. We also focussed on general approaches from which one can derived several different results: a criterium that guarantees the existence of uniform optimal strategies in two-player zero-sum games, and a characterization of plays that are the outcome of an Nash equilibrium in multi-player non zero-sum games. Several illustrative examples were provided as well as some intuition on the proofs when they are simple.
Acknowledgments
We would like to thank Patricia Bouyer, Thomas Brihaye, Emmanuel Filiot, Hugo Gimbert, Quentin Hautem, Mickaël Randour, and Jean-François Raskin for their useful discussions and comments that helped us to improve the presentation of this article.
Appendix
In this appendix, we give a sketch of proof for Muller games in Theorem 4.1. Recall that each player has the objective with , and that the values , , in each game can be computed in polynomial time (Theorem 3.5). To prove P membership for the constraint problem with bounds , we apply the approach (3). Notice that for the required play in (3), the set must be a strongly connected component that is reachable from the initial vertex . Moreover if for some , then for all , and if , then . We thus proceed as follows. (i) For each such that , for each (seen as a potential ), the following computations are done in polynomial time for all :
- •
if ((3) imposes ), test whether ,
- •
if ((3) imposes ), test whether and whether each has value ,
- •
if and ((3) allows either or ), then if , test whether each has value .
Finally, construct in polynomial time the game from such that each is limited to whenever , and test whether is a strongly connected component that is reachable from in . As soon as this sequence of tests is positive, there exists satisfying (3). (ii) It may happen that step (i) cannot be applied (because there is no such that , and for such that and , there is no potential ). In this case, we construct in polynomial time a two-player game from such that each is limited to , player controls no vertex and player is formed by the coalition of all , and the objective is a Muller objective with . We then test in polynomial time whether player has no winning strategy from in this Muller game.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Rajeev Alur, Salvatore La Torre, and P. Madhusudan. Playing games with boxes and diamonds. In CONCUR Proceedings , volume 2761 of Lecture Notes in Comput. Sci. , pages 127–141. Springer, 2003.
- 2[2] Daniel Andersson. An improved algorithm for discounted payoff games. In ESSLLI Student Session , pages 91–98, 2006.
- 3[3] Catriel Beeri. On the membership problem for functional and multivalued dependencies in relational databases. ACM Trans. Database Syst. , 5(3), September 1980.
- 4[4] Valérie Berthé and Michel Rigo, editors. Combinatorics, Words and Symbolic Dynamics , volume 135. Cambridge University Press, 2016.
- 5[5] Roderick Bloem, Krishnendu Chatterjee, Thomas A. Henzinger, and Barbara Jobstmann. Better quality in synthesis through quantitative objectives. In CAV Proceedings , volume 5643 of Lecture Notes in Comput. Sci. , pages 140–156. Springer, 2009.
- 6[6] Udi Boker, Thomas A. Henzinger, and Jan Otop. The target discounted-sum problem. In LICS Proceedings , pages 750–761. IEEE Computer Society, 2015.
- 7[7] Patricia Bouyer, Romain Brenguier, Nicolas Markey, and Michael Ummels. Pure Nash equilibria in concurrent deterministic games. Logical Methods in Comput. Sci. , 11(2), 2015.
- 8[8] Romain Brenguier. Robust equilibria in mean-payoff games. In Proceedings of FOSSACS , volume 9634 of Lecture Notes in Comput. Sci. , pages 217–233. Springer, 2016.
