Solving Zero-sum Games using Best Response Oracles with Applications to Search Games
Lisa Hellerstein, Thomas Lidbetter, Daniel Pirutinsky

TL;DR
This paper introduces efficient algorithms for solving zero-sum games with many strategies, leveraging best response oracles, and demonstrates their effectiveness in search game applications relevant to security and counter-terrorism.
Contribution
The paper develops algorithms that efficiently compute strategies in large zero-sum games using best response oracles, with practical applications to search games.
Findings
Algorithms perform well compared to existing methods
Effective in large strategy spaces with polynomial-time best response oracles
Successful application to security and counter-terrorism search scenarios
Abstract
We present efficient algorithms for computing optimal or approximately optimal strategies in a zero-sum game for which Player I has n pure strategies and Player II has an arbitrary number of pure strategies. We assume that for any given mixed strategy of Player I, a best response or "approximate" best response of Player II can be found by an oracle in time polynomial in n. We then show how our algorithms may be applied to several search games with applications to security and counter-terrorism. We evaluate our main algorithm experimentally on a prototypical search game. Our results show it performs well compared to an existing, well-known algorithm for solving zero-sum games that can also be used to solve search games, given a best response oracle.
| Total time | Convergence time | PI error () | PII error () | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| () | Set | HLP | FS | FS+ | HLP | FS | FS+ | HLP | FS | FS+ | HLP | FS | FS+ |
| 5 | 1 | 970 | 1394 | 1033 | 0.01 | 0.00 | 0.01 | 0.28 | 0.33 | 0.42 | |||
| 2 | 911 | 1209 | 1045 | 0.01 | 0.00 | 0.00 | 0.26 | 0.33 | 0.37 | ||||
| 3 | 1180 | 3470 | 1495 | 0.03 | 0.01 | 0.02 | 0.60 | 1.74 | 0.78 | ||||
| 4 | 1176 | 3421 | 1496 | 0.03 | 0.01 | 0.02 | 0.61 | 1.78 | 0.78 | ||||
| 10 | 1 | 4359 | 7835 | 3217 | 248 | 367 | 272 | 0.02 | 0.01 | 0.02 | 0.55 | 0.63 | 0.81 |
| 2 | 4232 | 6103 | 3777 | 233 | 320 | 276 | 0.02 | 0.01 | 0.02 | 0.53 | 0.62 | 0.70 | |
| 3 | 6943 | 6591 | 302 | 911 | 393 | 0.06 | 0.02 | 0.04 | 0.43 | 0.91 | 0.58 | ||
| 4 | 300 | 897 | 394 | 0.05 | 0.02 | 0.04 | 0.42 | 0.92 | 0.57 | ||||
| 50 | 1 | 214 | 584 | 241 | 12 | 21 | 16 | 0.15 | 0.07 | 0.11 | 2.65 | 2.37 | 3.07 |
| 2 | 209 | 455 | 282 | 13 | 21 | 19 | 0.14 | 0.06 | 0.10 | 2.40 | 2.33 | 2.64 | |
| 3 | 340 | 2900 | 492 | 15 | 53 | 24 | 0.38 | 0.07 | 0.24 | 2.12 | 0.93 | 2.24 | |
| 4 | 340 | 2894 | 493 | 14 | 53 | 24 | 0.36 | 0.07 | 0.24 | 2.24 | 0.94 | 2.23 | |
| 100 | 1 | 66 | 261 | 108 | 2 | 3 | 3 | 0.38 | 0.11 | 0.20 | 4.76 | 3.66 | 4.72 |
| 2 | 65 | 203 | 126 | 3 | 5 | 5 | 0.40 | 0.11 | 0.15 | 4.53 | 3.54 | 4.05 | |
| 3 | 104 | 1290 | 219 | 3 | 13 | 7 | 0.77 | 0.12 | 0.39 | 4.20 | 1.41 | 3.45 | |
| 4 | 104 | 1287 | 220 | 4 | 15 | 7 | 0.75 | 0.12 | 0.38 | 4.15 | 1.43 | 3.43 | |
| Instances | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| Set 1 | 6 | 7 | 1 | 3 | 6 | 6 | 7 | 8 | 3 | 1 |
| 1 | 10 | 2 | 3 | 7 | 2 | 8 | 10 | 8 | 1 | |
| 7 | 4 | 4 | 10 | 6 | 7 | 2 | 9 | 2 | 1 | |
| 4 | 5 | 4 | 1 | 2 | 6 | 9 | 6 | 1 | 6 | |
| 10 | 6 | 3 | 6 | 6 | 6 | 3 | 8 | 2 | 3 | |
| Set 2 | 57 | 47 | 44 | 99 | 22 | 89 | 95 | 12 | 10 | 8 |
| 7 | 42 | 100 | 10 | 31 | 66 | 8 | 52 | 96 | 32 | |
| 81 | 70 | 67 | 9 | 79 | 90 | 12 | 30 | 1 | 13 | |
| 82 | 4 | 11 | 32 | 79 | 17 | 63 | 62 | 9 | 32 | |
| 48 | 71 | 5 | 83 | 45 | 92 | 32 | 2 | 59 | 63 | |
| Set 3 | 7 | 2 | 10 | 6 | 3 | 8 | 7 | 8 | 3 | 2 |
| 5 | 4 | 2 | 10 | 1 | 4 | 7 | 8 | 5 | 8 | |
| 4 | 4 | 5 | 9 | 9 | 3 | 10 | 7 | 10 | 5 | |
| 5 | 7 | 2 | 9 | 6 | 10 | 5 | 3 | 6 | 5 | |
| 8 | 2 | 9 | 3 | 10 | 9 | 9 | 5 | 2 | 8 | |
| 6 | 6 | 6 | 1 | 4 | 5 | 8 | 3 | 1 | 10 | |
| 3 | 1 | 6 | 7 | 7 | 7 | 3 | 3 | 1 | 4 | |
| 4 | 7 | 6 | 7 | 6 | 5 | 8 | 2 | 10 | 9 | |
| 5 | 2 | 9 | 10 | 1 | 5 | 1 | 2 | 5 | 1 | |
| 6 | 4 | 1 | 10 | 5 | 3 | 7 | 9 | 1 | 9 | |
| Set 4 | 58 | 51 | 56 | 11 | 70 | 15 | 96 | 54 | 65 | 62 |
| 12 | 78 | 99 | 38 | 83 | 65 | 71 | 58 | 41 | 30 | |
| 35 | 100 | 87 | 51 | 59 | 99 | 69 | 65 | 39 | 96 | |
| 35 | 25 | 87 | 14 | 84 | 14 | 84 | 14 | 7 | 67 | |
| 46 | 47 | 90 | 8 | 16 | 98 | 39 | 98 | 60 | 60 | |
| 2 | 33 | 45 | 27 | 15 | 40 | 61 | 18 | 90 | 36 | |
| 72 | 96 | 76 | 57 | 73 | 90 | 6 | 15 | 43 | 25 | |
| 91 | 51 | 66 | 22 | 60 | 86 | 91 | 61 | 77 | 74 | |
| 6 | 28 | 65 | 27 | 45 | 53 | 38 | 57 | 8 | 98 | |
| 54 | 62 | 17 | 26 | 88 | 84 | 86 | 74 | 5 | 73 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Game Theory and Applications · Advanced Bandit Algorithms Research
Solving Zero-sum Games using Best Response Oracles
with Applications to Search Games
Lisa Hellerstein Department of Computer Science and Engineering, NYU Tandon School of Engineering, 6 Metrotech Center, Brooklyn, NY 11201, USA
Thomas Lidbetter Department of Management Science and Information Systems, Rutgers Business School, Newark, NJ, USA
Daniel Pirutinsky22footnotemark: 2
Abstract
We present efficient algorithms for computing optimal or approximately optimal strategies in a zero-sum game for which Player I has pure strategies and Player II has an arbitrary number of pure strategies. We assume that for any given mixed strategy of Player I, a best response or “approximate” best response of Player II can be found by an oracle in time polynomial in . We then show how our algorithms may be applied to several search games with applications to security and counter-terrorism. We evaluate our main algorithm experimentally on a prototypical search game. Our results show it performs well compared to an existing, well-known algorithm for solving zero-sum games that can also be used to solve search games, given a best response oracle.
1 Introduction
Consider a zero-sum game with positive integer payoffs in which Player I (the maximizer) has pure strategies and Player II (the minimizer) has an arbitrary number of pure strategies. The payoff function is denoted by , so that if Players I and II play pure strategies and , respectively, then the payoff is given by . We also allow the arguments of to be probability vectors expressing mixed strategies, in which case denotes expected payoffs. Whenever we refer to games in this paper we assume they are of this form. Optimal mixed strategies and the value of the game can be found solving a linear program, but if Player II’s strategy set is very large compared to then the computational time required to do this may not be polynomial in .
Examples of this nature frequently arise in the study of finite search games for an immobile Hider. In such games, which are played between a Hider and a Searcher, the Hider typically has a strategy set corresponding to locations in which to hide and the Searcher’s strategy set corresponds to a permutation of those locations (or a subset thereof), corresponding to the order in which she searches the locations. The order of search may be restricted due to a network structure of the search space or other restrictions on the mode of search. Notwithstanding this, the size of the Searcher’s strategy set is usually exponential in . The payoff of the game is generally either some cost incurred by the Searcher in searching for the Hider, which she wishes to minimize, or the probability of detection, which she wishes to maximize.
Search games are motivated by wide-ranging problems in national security, counter-terrorism, search-and-rescue, biology and others. Summaries of the literature on search games for an immobile Hider can be found in Alpern and Gal (2003), Gal (2011) and Hohzaki (2016).
In many such games, it is possible to exploit the structure of the game in order to solve the game (that is, find optimal strategies and the value) or find bounds on the value. An example of this is the classic search game for an immobile Hider on a network studied in Gal (1979) and Gal (2000). More recent examples can be found in Alpern (2016), Alpern and Lidbetter (2015, 2013a, 2013b), Angelopoulos et al. (2016), Baston and Kikuta (2013, 2015), Fokkink et al. (2016a) and Lin and Singham (2016). Many of these games are infinite, in the sense that one or both of the players has a strategy set of infinite cardinality, but in this paper we restrict ourselves to finite games, which lend themselves better to an algorithmic approach.
Exploiting the structure of the game may not be possible in some games, or it may limit what can be achieved. However, in some games it may be possible to efficiently solve the one-sided problem of finding an optimal best response of Player II for any given mixed strategy of Player I: that is, finding a pure strategy for Player II which has minimal expected payoff against the given mixed strategy of Player I. We will refer to this problem as the best response problem. In this paper we study games for which there exists an oracle that can efficiently solve the best response problem. More generally, we consider games such that there is an oracle that, for any mixed strategy of Player I, can find a pure strategy for Player II that ensures the payoff is no more than times the best response payoff. We call such an oracle an -approximate best response oracle. So if , then an -approximate best response oracle finds best responses; in this case we simply call it a best response oracle.
We show that for games with an -approximate best response oracle, there are efficient algorithms for finding -optimal strategies for both players. By this we mean a mixed strategy for Player I that guarantees a payoff of at least and a mixed strategy for Player II that guarantees a payoff of at most , where is the value of the game.
Some of our analysis relies on previous work, as explained in Subsection 1.2, and our main contribution is the wide range of applications to the field of search games, detailed in Section 5. The examples we give include known search games with known solutions, known search games with unknown solutions and new search games, all of which can be efficiently solved, either exactly or approximately, using our algorithms. We hope that by giving these techniques more prominence, they might be used by others for the purpose of solving search games, or indeed games in other fields.
We demonstrate the efficacy of our approach experimentally by applying our main algorithm for computing optimal strategies (presented in Section 4) to the game described in Subsection 1.1. We show it performs favorably relative to a comparable algorithm of Freund and Schapire (1999, 1996).
1.1 Example: Searching in Boxes
Before describing our algorithms in detail, we illustrate the type of game we will consider with an example of a search game that is easy to state, and that has the characteristics of the type of games to which our results can be applied. Although its solution is already known, it is a fundamental game and provides a good introduction to the more complex games we will discuss in Section 5.
An item (for example an explosive device) is hidden in one of boxes with costs , so that the Hider’s strategy set is the set of boxes, . A pure strategy for the Searcher is simply an ordering of the boxes, so the Searcher’s strategy set is the set of all permutations . In other words, is the th location to be inspected. Clearly the Hider’s strategy set has size and the Searcher’s strategy set has size . For given strategies of the Hider and of the Searcher, the payoff of the game, which we call the search cost is the sum of the costs of all the boxes in the ordering up to and including the box that contains the hidden object. That is,
[TABLE]
We call this game BOX. The solution of BOX is already known in closed form. The Hider’s optimal strategy is to choose box with probability proportional to as shown in Alpern and Lidbetter (2013b), where it was also proved that this is the unique optimal strategy. The Searcher does not have a unique optimal strategy, but three different optimal strategies can be found in Alpern and Lidbetter (2013b), Lidbetter (2013) and (implicitly) in Condon et al. (2009). The value of the game is also given in closed form (in two different ways) in Alpern and Lidbetter (2013b) and Lidbetter (2013). It is somewhat surprising, given the simplicity with which the game can be described, that prior to these publications the game had not been solved, to the best of our knowledge.
We now show that one could alternatively find optimal strategies in this game by applying our algorithms. To do this, we must show that there is a best response oracle. This is the problem of finding a permutation that minimizes for a given Hider mixed strategy . It is a classic search problem solved by Blackwell, reported in Matula (1964). In fact, Blackwell solved a more general problem, which we describe later in Subsection 5.7. For this version of the problem, the solution is to search the boxes in non-increasing order of the index .
We can arrive at the solution in another way, taken from the scheduling literature. Consider the problem of scheduling jobs with processing times and weights , that correspond to the relative importance of the jobs. The object is to determine the schedule that orders the jobs in such a way as to minimize the weighted sum of the completion times. This problem is usually denoted in the scheduling literature, and has the well known solution, given by Smith (1956), of scheduling the jobs in non-decreasing order of the ratio . Of course, viewing the costs in the best response problem for BOX as processing times, and viewing the probabilities as weights, these two problems and their solutions are equivalent.
We could also consider the more general box searching game introduced and solved by Lidbetter (2013) of searching for objects hidden in boxes. However, in this case our approach cannot be used to solve the game, since the solution of the best response problem is not known. In fact, the computational complexity of this problem is unknown, even for , and we view this as an interesting open problem. More precisely, suppose two items are hidden in boxes with search costs according to a known distribution, so that the probability they are in boxes and is some , for . What ordering of the boxes minimizes the total expected cost of finding both objects?
1.2 Overview of Paper
We take two approaches to using an -approximate best response oracle to find approximate solutions to games. In Section 3, we present an algorithm using an existing ellipsoid approach from the approximation algorithms literature. This approach uses the ellipsoid algorithm with an approximate separation oracle to achieve approximately optimal solutions to an LP and to its dual. The approach was previously used in work on solving generic LPs, and was used to develop approximation algorithms for a number of different combinatorial optimization problems, including routing, scheduling, and graph coloring (Jansen (2003), Carr and Vempala (2002), Friggstad and Swamy (2012) and Feldman et al. (2012)). However, while the ellipsoid algorithm is theoretically important as the first polynomial time algorithm for solving LPs, it is extremely slow and is not used in practice. Thus we include our algorithm from Section 3 for theoretical interest only.
In Section 4 we use a multiplicative weights update approach: a good general survey of this method can be found in Arora et al. (2012). Approximately 20 years ago, Freund and Schapire (1999, 1996) gave a multiplicative update algorithm that computes near-optimal strategies for two-player zero-sum games. Their algorithm yields strategies with payoffs that are within an additive of the game value . Specifically, the strategy for Player I guarantees an expected payoff of at least , and the strategy for Player II guarantees an expected payoff of at most . The algorithm assumes that the payoffs are in the interval , so represents a fraction of the range of payoffs.
Surprisingly, the approach of Freund and Schapire does not seem to have been extended to yield multiplicative approximations: that is, strategies that guarantee an expected payoff within a multiplicative factor of the value. Multiplicative approximations are more common in the literature on approximation algorithms. It is also easy to show that for small , our notion of -optimal is equivalent to the notion of a relative -approximate Nash equilibrium from Daskalakis (2013), in the case of zero-sum games (see the Appendix).
In Section 4, we obtain -optimal strategies. We use a multiplicative update rule of Garg and Könemann (2007), and we rely somewhat on their analysis, though their work does not automatically imply the existence of approximately optimal strategies for Player I. We therefore combine this with a variation of the analysis of Freund and Schapire (1999) as presented by Arora et al. (2012). We make some changes, due to the form of our multiplicative approximation factor, and the fact that our algorithm is designed to accommodate an approximate best response oracle. However, our primary contribution is a bringing together of ideas resulting in what we believe is the first self-contained exposition of an algorithm and its analysis for computing multiplicative approximations of optimal strategies in a zero-sum game.
We then apply our algorithms to a variety of games in Section 5.
In Section 6 we describe the results of experiments we performed, in which we compare our multiplicative weights algorithm and the multiplicative weights algorithm of Freund and Schapire (1999) on the game BOX.
2 Preliminaries
In what follows, we consider a zero-sum game between a maximizing Player I with pure strategies and a minimizing Player II with a number of strategies that that could be as large as , for some polynomial . The payoff function is and all the payoffs are positive. We assume there is an -approximate best response oracle, for some constant . That is, there is a polynomial time algorithm (polynomial in ) which, for any given mixed strategy of Player I, computes a pure strategy such that . In other words, the oracle computes a pure strategy for Player II whose payoff against is within a factor of the payoff given by Player II’s best response.
Let be the value of the game. We say that a strategy for Player I is -optimal if for any strategy of Player II. We say a strategy for Player II is -optimal if for any strategy of Player I.
We assume that mixed strategies and for the two players are represented in sparse form, as a list containing the index of each non-zero entry, together with its value. This is especially important for , as the number of pure strategies for Player II may not be polynomial in , and we want the running time to depend polynomially on .
Let denote the game matrix, so that . We assume an oracle for computing . That is, we assume there is a polynomial-time algorithm (polynomial in ) that takes as input (representations of) pure strategies and of Players I and II, and outputs .
3 Ellipsoid Approach
Let be the largest entry in the game matrix. We assume this quantity is given as part of the input, or that it can be computed in time polynomial in . In fact, it is sufficient to compute an upper bound that is polynomial in and . We assume in this section that the payoffs are integers, since the standard bounds on the runtime of the ellipsoid algorithm assume that the input LP has integer coefficients.
Consider the standard LP representing the problem of finding an optimal strategy for Player I.
LPI: Maximize
such that
(1)
(2)
(3)
A separation oracle for LPI takes as input an assignment to the variables of the LP, and either reports that it is a feasible solution, or returns a hyperplane separating the assignment from the feasible region. If we had a separation oracle for LPI, we could use the ellipsoid algorithm to obtain an optimal strategy for Player I.
Although we do not have such an oracle, we can use the -approximate best response oracle as an approximate separation oracle for LPI. A query to a separation oracle for LPI corresponds to a pair , where the first element is the assignment to the and the second is the assignment to . We can answer query , approximately, using the following procedure:
Check if a constraint of type (2) or (3) of LPI is violated by assignment . If so, return such a violated constraint (separating hyperplane) as the answer to the query and exit. 2. 2.
Otherwise, corresponds to a mixed strategy of Player I. Query the -approximate best response oracle on . Let be the returned strategy for Player II. 3. 3.
Check whether is satisfied by the queried assignment . 4. 4.
If not, return the violated constraint as the answer to the query. Otherwise, answer the query by reporting (possibly incorrectly) that is a feasible solution.
Note that in the last step may be returned, even though it is not necessarily feasible. However, in this case is feasible, because was returned by the -approximate best response oracle.
With this approximate separation oracle, we can apply the ellipsoid approach from the approximation algorithms literature, referenced above in Section 1.2. (In fact, Feldman et al. (2012) uses a minor variant of the approach that yields a somewhat smaller restricted dual LP. For simplicity of presentation, we do not discuss this variant here.)
It yields an algorithm that computes -optimal strategies for Players I and II. The algorithm makes use of the dual of LPI, which computes an optimal strategy for Player II. We present the dual LP as LPII.
LPII: Minimize
such that
(1)
(2)
(3)
The algorithm is as follows. To compute an -approximate strategy for Player I, run the ellipsoid algorithm using the approximate separation oracle for LPI described above. Let be the set of violated constraints returned by the oracle. Let denote the strategy that is returned. Return it as the -optimal strategy for Player I.
Let be the set of variables in LPII, corresponding to the constraints in . Let LPIIRestrict denote the restriction of LPII that is produced by setting all to 0. Generate LPIIRestrict explicitly, computing for all and for all , to obtain the constraints. Solve this LP using a standard polynomial-time LP algorithm, such as the ellipsoid algorithm, to obtain an optimal solution for LPIIRestrict. Return as the strategy for Player II.
This algorithm is the basis for the following theorem. In the proof, we provide the analysis of the algorithm for the benefit of the reader; the analysis is essentially the same as that presented by Jansen (2003) for solving general LPs and their duals.
Theorem 1
Let be the value of the game. There exists an algorithm that computes strategies and for Players I and II such that
[TABLE]
This algorithm, which uses the ellipsoid algorithm, runs in time polynomial in and , where is the largest entry in the game matrix.
Proof:
The algorithm presented above computes the strategies and . We now prove that and are -optimal strategies for Players I and II respectively. To show that is an -optimal strategy for Player I, let LPI() denote the feasibility problem for the set of constraints of LPI, with . Let is a feasible solution to LPI(). First consider what happens if you run the ellipsoid algorithm using an exact separation oracle for LPI. It performs a binary search on the values in an interval , where and is polymomial in and . The optimal value of LPI is guaranteed to be a value in . During the binary search, for each tested value in , the ellipsoid algorithm makes a sequence of queries to the separation oracle, where in each query, . The querying continues until either (a) the separation oracle returns the queried assignment (indicating that ) or (b) the set of violated constraints returned by the separation oracle, in answer to the queries in , implies that (because there is no assignment satisfying all constraints in that set). The binary search finds the largest value of for which . The ellipsoid algorithm then returns this largest value , along with the associated LPI(), as the optimal solution to LPI.
Now consider what happens when the ellipsoid algorithm is instead run with the approximate separation oracle described above. Recall that is the optimal value of LPI. The binary search is run, and ends by returning a final value of , with an associated assignment . Let denote this final value, and let be the associated assignment. By the properties of the approximate separation oracle, and the behavior of binary search, and . Because ,
[TABLE]
Further, because , it follows that for all strategies of Player II,
[TABLE]
and hence from (1),
[TABLE]
for all strategies of Player II. Thus is an -optimal strategy for Player I.
By the well-known results in Grotschel et al. (1988) on the runtime of the ellipsoid algorithm, the above computation of takes time polynomial in and .
The proof that is an -approximate strategy for Player II is as follows. Let denote the optimal value of LPIIRestrict. Since is an optimal solution to LPIIRestrict,
[TABLE]
for all strategies of Player I.
Because ,
[TABLE]
Since LPII is dual to LPI, the optimal value of LPII is also .
Let LPIRelax denote the dual of LPIIRestrict obtained from LPI by removing the constraints for . By duality, is also the optimal value for LPIRelax. The violated constraints returned when running the ellipsoid algorithm with the approximate separation oracle for LPI are also violated constraints of LPIRelax. Therefore, the answers given by the approximate separation oracle for LPI could also have been given for LPIRelax. Since the execution of the ellipsoid algorithm depends entirely on the answers to the separation oracle queries, and not on other properties of the LP, it follows by the same argument used to prove (1) that
[TABLE]
Combining that with (5), we get . From (4), it immediately follows that is an -optimal strategy for Player II.
Because the computation of takes time polynomial in and , the number of constraints in LPIIRestrict is polynomial in those parameters. It follows that computing takes time polynomial in those quantities.
4 Multiplicative Weights Approach
We now present an alternative approach to solving (or approximately solving) games using an -approximate best response oracle. This approach is more “combinatorial” in nature, in the sense that it does not depend on the values of the payoffs in the same way as the approach of Section 3.
Our algorithm takes the same high level approach as the algorithm of Freund and Schapire (1999), which finds strategies that approximate the optimal strategies within an additive . Player I begins with a uniformly random strategy. Player I then repeatedly plays mixed strategies, to which Player II replies with best responses. In each round, Player I updates her strategy, using some multiplicative update rule based on the best response of Player II. The more successful the pure strategy is (that is, the higher payoff it achieves), the more weight Player I gives it in the next round.
The algorithm is based on a simple observation: finding optimal strategies for Player II is easily reducible to the problem of solving a Packing LP. An approximately optimal solution to the latter problem is given by a multiplicative weights update algorithm, due to Garg and Könemann (2007). Our algorithm uses essentially the same update rule, and we also rely on part of their analysis to show that this gives an approximately optimal solution for Player II. We show how the multiplicative weights update algorithm also yields an approximately optimal strategy for Player I. We believe the strategy for Player I to be an original contribution, which does not immediately follow from the analysis of Garg and Könemann (2007).
The relation between our games and packing LPs is as follows. A packing LP is a linear program of the form where , and all have positive entries. If is the payoff matrix of a two-player zero-sum game, , , and is the optimal solution to the LP, it is easy to show that the strategy that chooses each column with probability proportional to is an optimal strategy for the Player II. Thus one can reduce the problem of computing an optimal strategy for Player II to the problem of solving the packing LP. Because the reduction is so simple (just a scaling of the variables), we integrate it into our algorithm, rather than performing it explicitly as a separate step. We note that a special case of this reduction was exploited previously by Condon et al. (2009) in the solution of a variant of the game BOX. However, the resulting specialized packing LP was then solved by different methods.
We first describe our algorithm in detail, then give a bound on the run time, and finally prove that it works.
Fix and (we will specify their values later) and let be the mixed strategy for Player I given by for all .
Set . 2. 2.
Use the -approximate best response oracle to compute . 3. 3.
Set . (Positive payoffs guarantee that is well defined.) 4. 4.
Define for all . (The expression in the denominator is a normalizing factor to ensure that is a mixed strategy.) 5. 5.
If , then set and stop. Else increase by and return to Step 2.
Let be the mixed strategy that chooses pure strategy with probability proportional to . More precisely, is proportional to . Let be the value of that maximizes and let . We will later show that these strategies and are approximately optimal.
The proof of the following lemma is from Garg and Könemann (2007) (following the presentation in Liu et al. (2008)).
Lemma 1
The number of iterations of the algorithm satisfies
[TABLE]
Proof:
For , let and for let . It follows that
[TABLE]
Note that the stopping condition is equivalent to the condition , since
[TABLE]
Therefore, since the algorithm terminates on the th iteration, we must have and .
In each iteration of the algorithm, there is some pure strategy for Player I such that . Call such a strategy a bottleneck strategy. If is a bottleneck strategy in iteration then , so if is a bottleneck strategy a total of times then . But also , since otherwise , contradicting . Putting together these inequalities,
[TABLE]
Inequality (7) follows from the fact that Player I has pure strategies and in each iteration at least one of them is a bottleneck strategy.
We now prove that for an appropriate choice of the parameters and , the strategies produced by the algorithm are approximately optimal and that the total runtime is not too large.
Theorem 2
Let be the value of the game. For any , there exist algorithms that compute strategies and for Player I and II such that
[TABLE]
The algorithm that computes runs in time polynomial in and ; the algorithm that computes runs in time polynomial in , and .
Proof:
For each , we have
[TABLE]
Since every coordinate of is at most , we have
[TABLE]
The second inequality above follows from the facts that
[TABLE]
and
[TABLE]
Taking natural logarithms in (9), we get
[TABLE]
Now we use the fact that and rearrange:
[TABLE]
By the linearity of the cost function and the right-hand side, the inequality above is still true if we replace the terms by , where is any arbitrary strategy for Player I. Combining this with the fact that , we get
[TABLE]
By definition of ,
[TABLE]
and substituting this into (10) and rearranging gives
[TABLE]
By the stopping condition, , so
[TABLE]
We will see later than , so that is negative. Combining (12) with (11) gives
[TABLE]
Let and let , and use to get
[TABLE]
With these choices of and , by Lemma 1, the number of iterations of the algorithm satisfies
[TABLE]
This is clearly polynomial in and .
We now turn to Player I’s strategy . We will choose different, smaller values for both and (to be specified later), giving a longer runtime for computing that is polynomial in , and . If we wish to calculate both strategies we could either run the algorithm twice using the two different choices of and , or simply run the algorithm once using the second choice. This would give Players II’s strategy as well as Player I’s since the right-hand side of (13) is increasing in and .
First let be an optimal strategy of Player I. Then because is optimal. Hence, substituting into (10),
[TABLE]
By definition of , we have for any strategy so
[TABLE]
Again, take and this time take , so that
[TABLE]
For these values of and , the bound on the runtime given by Lemma 1 is
[TABLE]
This time is polynomial in , and .
We note that our algorithm uses rounds of updates, while the algorithm of Freund and Schapire (1996) (which finds strategies within an additive of optimal) has only a logarithmic dependence of . In fact, a multiplicative approximation factor of cannot be achieved in rounds of a multiplicative update algorithm, for any , as we state in the following proposition. The proof can be found in the appendix.
Proposition 1
Let . There exist zero-sum games with strategies for each player, with positive payoffs, such that Player II has no -optimal strategy with support of size .
5 Applications to Search Games
In this section we describe how our algorithms can be applied to several known and new search games. Each game is played between a Hider, who corresponds to Player I in our theorems, and a Searcher, who corresponds to Player II. In this section we present several theorems concerning the existence of polynomial time algorithms for computing optimal or approximately optimal strategies for search games. These theorems refer to algorithms based on either the ellipsoid approach in Section 3 or the multiplicative weights approach in Section 4.
5.1 Searching in Boxes with Precedence Constraints
We begin with a generalization of the game BOX discussed in Subsection 1.1. In particular, we restrict the Searcher only to orderings that are consistent with some predefined precedence constraints on the boxes. That is, we suppose that there is a partial order on the boxes so that box can be searched before box if and only if . The Hider’s strategy set and the payoff function remain unchanged. We call this game PREC, and while it has not been studied before in the form we define it in here, a more general version of it was studied in Fokkink et al (2016b), as we will discuss in the next subsection.
In order to apply our algorithms to PREC we must consider the best response problem of choosing the search that minimizes the expected search cost for a given Hider distribution . Similarly as for BOX, we can reframe this problem as the scheduling problem of finding a precedence-constrained ordering of a set of jobs with processing times and weights to minimize the sum of the weighted completion times. This problem is usually denoted in the scheduling literature. It is well known to be NP-hard, and many 2-approximation algorithms can be found, for instance Chekuri et al. (1999) and Schulz (1996). These algorithms can be used as a 2-approximate best response oracle, so we obtain our first new result for search games.
Theorem 3
There exist polynomial time algorithms that compute -approximate strategies for both players in PREC.
5.2 The Submodular Search Game
This game was introduced by Fokkink et al. (2016a), and further studied by Fokkink et al (2016b). As in the game, BOX, the Hider’s strategy set is and the Searcher’s strategy set is all permutations of . The difference is in the payoff function. Let be a non-decreasing submodular set function on , which corresponds to the cost of searching subsets of . For a given hiding location and a given permutation , the payoff of the game is defined as the cost of the set of locations searched up to and including . That is,
[TABLE]
We call this game SUB. It is a further generalization of PREC. This can be seen by defining a cost function on subsets of in PREC such that is the cost of all the elements of in the precedence-closure of . Then, under this cost function, optimal strategies for SUB will correspond to optimal strategies in PREC.
Fokkink et al (2016b) also consider the best response problem for SUB, and they prove that this problem can be solved approximately, within a factor of , generalizing the analogous result in the scheduling literature. It follows that our algorithms can be used to calculate -approximate strategies for the players in SUB, and so we obtain our next new result for search games.
Theorem 4
There exist polynomial time algorithms that calculate -approximate strategies for both players in SUB.
5.3 Expanding Search
In PREC, the partial order on uniquely defines a directed acyclic graph, and if that graph is a tree then PREC is equivalent to the expanding search game on a tree, introduced in Alpern and Lidbetter (2013b). The game, which we will call EXP, can be played on any undirected graph with a distinguished vertex called the root and edges with costs corresponding to the time taken to traverse them. A pure strategy for the Hider is a vertex at which to hide, and a pure strategy for the Searcher is a sequence of edges, the first of which is incident to the root, and each other of which must be adjacent to a previously chosen edge, not necessarily the most recently chosen edge. This type of search on a graph is called an expanding search. The payoff is the search cost, which is the sum of the costs of the edges chosen by the Searcher up to and including the first edge that is incident to the Hider’s chosen vertex. See Alpern and Lidbetter (2013b) for motivations and applications of expanding search.
For EXP played on a tree, the best response problem is what Alpern and Lidbetter (2013b) call the Bayesian problem of finding a search that minimizes the expected payoff for a known hiding distribution on the vertices of the tree, and the authors give a solution to this problem. The solution to the equivalent scheduling problem for tree-like precedence constraints is also known, and can be solved using the so-called Sidney decomposition proposed by Sidney (1975). Therefore, our algorithms can be applied to find optimal strategies for the players in the game. However, optimal strategies have already been found in closed form for this game in Alpern and Lidbetter (2013b).
If we consider EXP on a more general graph, then the best response problem can be shown to be NP-hard (Dürr (2016)). Whether the problem can be efficiently approximated is an open question. If this were the case then our results would imply that approximately optimal strategies for EXP could also be found.
5.4 Expanding Search Ratio
This is a game that was introduced by Angelopoulos et al. (2016). The game is played on a graph with vertices plus a root vertex and edges with costs. The Hider chooses one of the non-root vertices of the tree and the Searcher chooses an expanding search, as in the expanding search game considered in Alpern and Lidbetter (2013b). However, for a given vertex and a given expanding search, the payoff is not the search cost, but the ratio of the search cost to the shortest path distance from the root to . This payoff is called the search ratio. It is a measure of the “regret” incurred by the Searcher in finding the Hider, and so we call the game EXPr.
Angelopoulos et al. (2016) find a -optimal strategy for the Searcher in EXPr for star graphs, which is generalized in Angelopoulos et al. (2017) to general unweighted graphs (where all the edges have equal cost) and tree graphs. Independently, Condon et al. (2009) have studied the same game for star graphs in the context of throughput maximization, and they give a full solution. But the solution of EXPr for any other class of graphs is unknown.
Consider EXPr played on a tree graph, and we will show that in this case, the best response problem can be solved in polynomial time. Indeed, suppose the Hider is located at vertex with probability . Let and let be the normalization of , so that is a probability vector. Then the best response problem is the problem of finding the expanding search that minimizes the expected search ratio with respect to , which is clearly equivalent to the problem of finding the expanding search that minimizes the expected search time with respect to . The latter problem is the best response problem for EXP, which, as discussed in Subsection 5.3, can be solved in polynomial time. Therefore we obtain the following theorem.
Theorem 5
There exist polynomial time algorithms that calculate optimal strategies for both players in EXPr played on a tree.
For other types of graphs, using our results depends on finding approximations to the best response problem for EXP.
5.5 Search Games with Regret
The payoff matrix of the game EXPr discussed in the previous subsection, can be obtained from the payoff matrix of the game EXP by dividing each row of the matrix by a constant, which is equal to the shortest path distance to the vertex corresponding to that row. If we take any search game, we can define a “regret version” of that game by dividing the row corresponding to each Hider pure strategy by some constant. There is usually a natural choice of constant, for example in SUB, we might divide the row corresponding to Hider strategy by , the cost that would be incurred by the Searcher if she knew where the Hider was (provided for all ).
If there is an -approximate best response oracle for some search game, then it follows that there is also an -approximate best response oracle for any regret version of that game, using similar reasoning as in Subsection 5.4. This implies, using our results, that we can find -optimal strategies in both games. So, for instance, we immediately find that for the regret version of SUB, there is an efficient algorithm that finds 2-approximate strategies.
We sum this up below.
Theorem 6
Suppose there is an -approximate best response oracle for some game we call GAME with payoff function . Let GAMEr be the game defined by the payoff function , where are non-negative constants. Then there is an -approximate best response oracle for GAMEr, and hence there exist polynomial time algorithms for calculating -optimal strategies.
5.6 Hide-Seek and Pursuit-Evasion
We now discuss a different type of game, introduced by Gal and Casas (2014) in the context of predator-prey interaction, though the game could equally apply to problems of national security such as the pursuit of a terrorist. As in the games discussed above, a Hider or hidden object is located in one of locations (this time without search costs). The Searcher can search a subset of locations size at most , for some given , and if the Hider’s location lies in this set, a pursuit ensues and the Searcher captures him with probability . The payoff of the game, which the Hider wishes to minimize and the Searcher to maximize, is the probability that the Hider is captured. Gal and Casas (2014) provide a closed form solution to this game.
Here we introduce a generalization of this game, in which the locations have search costs . A pure strategy for the Searcher is a subset of locations, the sum of whose search costs does not exceed . The payoff, as before, is the probability the Hider is captured. We call this game HSPE (hide-seek and pursuit-evasion). If all the locations’ costs are equal to 1, then HSPE is equivalent to the original game of Gal and Casas (2014). Consider the best response problem, where the probability that the Hider is in location is known and the problem is to choose a subset of locations that maximizes the probability of capture. Let . Then the problem is to choose a subset of of total cost at most , that maximizes . This is the classic Knapsack problem, which has a fully polynomial time approximation scheme (see Vazirani (2013)). That is, given , there exists an algorithm that is polynomial in and that approximates the solution of the knapsack problem within a multiplicative factor of . This implies the following.
Theorem 7
The problem of finding optimal strategies in the game HSPE has a fully polynomial time approximation scheme.
5.7 An Infinite Game
We finish this section by discussing an infinite game studied by Lin and Singham (2015), for which our results do not directly apply because the Searcher has a strategy set of (countably infinite) cardinality. We are optimistic that our results could be extended to such games in future work, which is why we mention it here.
In the game of Lin and Singham (2015), a Hider is located in one of locations with search costs and capture probabilities , as in the game, HSPE of the previous subsection. The Searcher chooses an infinite sequence of locations, and each time the Searcher examines the Hider’s location , she finds him independently with probability . The payoff of the game is the Searcher’s expected cost of finding the Hider for the first time.
As Lin and Singham (2015) point out, the solution of the best response problem for this game is well known. Suppose the Hider is hidden in location with probability . Then, as showed by Blackwell (reported in Matula (1964)), the optimal policy for the Searcher is, at any time, to choose a location that maximizes the index , with the hiding probability being updated according to Bayes’ Law after each search. Of course, because the optimal policy is an infinite sequence, it cannot necessary be concisely expressed, but an approximately optimal solution may be found, and Lin and Singham (2015) exploit this fact to give an algorithm which they believe converges to an optimal search strategy in the game. We leave it for future work to develop a provably fully polynomial approximation scheme for the problems of determining optimal strategies in the game.
6 Numerical experiments
In this section we test our multiplicative weights algorithm of Section 4 and compare it to the algorithm of Freund and Schapire (1999). The latter algorithm has the same form as ours, but updates the weights in each iteration using a different multiplicative factor. It outputs strategies that ensure an expected payoff of at least for Player I and at most for Player II. The number of iterations is , but it assumes the payoffs have been scaled to lie in .
In order to compare the Freund and Schapire algorithm to ours, we note that theirs could be applied directly to obtain a multiplicative approximation, but the resulting number of rounds would depend on the payoffs of the game. More particularly, consider using their algorithm to find a strategy for Player II that is within a multiplicative factor of optimal. This requires scaling the payoffs to be in , which can be done by dividing them by , the maximum entry in the payoff matrix. Achieving a multiplicative approximation of is equivalent to achieving an additive approximation of , relative to the scaled values. For Player I, achieving a optimal strategy that is within a multiplicative factor of of optimal means guaranteeing a payoff of at least times the value of the game. Thus achieving this approximation for Player I is equivalent to achieving an additive payoff of at least relative to the scaled values, where . However, we do not know the value of , so in order to calculate the appropriate additive approximation, we have to use a lower bound for . Thus, a multiplicative approximation of is best compared to an additive approximate of , where
[TABLE]
So to ensure the same quality of approximation for both players we should use (15) to calculate the appropriate additive approximation in the Freund and Schapire algorithm. The simplest choice for the lower bound would be the minimum value of any payoff in the game. This yields iterations, a quantity that depends on the payoffs. If is constant, this algorithm requires fewer rounds than ours, asymptotically, but if is large compared to it could require an arbitrarily large number of rounds.
Here we test both our multiplicative weights algorithm and that of Freund and Schapire (as presented in Arora et al. (2012)) on instances of the game BOX. We randomly generated 40 different instances of the game, which we grouped into four sets. For Set 1 and Set 2, we took , and for Set 3 and Set 4 we took . The payoffs in Sets 1 and 3 were uniformly chosen at random to be an integer between and , and in Sets 2 and 4 they were chosen uniformly at random between and . The 40 instances of the game BOX can be found in Table 2 in the appendix.
We also considered four different values of . In order to fairly compare the two algorithms, for each value of we used for the multiplicative approximate of in our algorithm, we used for the Freund and Schapire algorithm, as given in (15). For the lower bound on the value, we thought it unfair to take unthinkingly the smallest payoff in the game, and instead took to be the largest cost of the boxes. This is a better simple lower bound on the value, which the Hider can achieve by hiding the object in the highest cost box.
An even better lower bound could be obtained by using the more sophisticated approach of taking the best response payoff , where is the uniform strategy for all and is Player II’s best response to . Therefore we then ran the Freund and Schapire algorithm a second time using this choice of (if it was larger).
The algorithms were implemented in Python and the experiments were run on a MacBook Pro laptop with a 3.1GHz Intel Core i7 processor and 16GB of RAM. The results are displayed in Table 1. Our algorithm is labeled “HLP”, the Freund and Schapire algorithm with is labeled “FS” and the Freund and Schapire algorithm with the more sophisticated choice of as described above is labeled “FS+”. We ran our algorithm, taking and (as defined in the proof of Theorem 2).
The number of iterations of our algorithm is determined by our theoretical results. That is, our theoretical results say that if you run the algorithm for a certain number of iterations, you are guaranteed to achieve a certain accuracy. But, in fact, you may be able to achieve that accuracy in fewer iterations in practice. So in our experiments, we explore both the number of iterations needed for the algorithm to terminate, and the number of iterations needed to achieve the desired accuracy.
The columns in Table 1 labeled “Convergence time” display what can be thought of as the average time the algorithms took to converge “in practice”. More precisely, convergence time is the number of iterations until we could be sure that both strategies converged to the desired accuracy, so that the ratio of the maximum payoff guaranteed by Player I to the minimum payoff guaranteed by Player II is no more than .
The columns in Table 1 labeled “Total time” indicate the average number of iterations until the algorithms terminated. The time to produce each entry in the table (requiring running each algorithm on 5 or 10 different versions of the game) ranged from under one second to over 25 hours, depending on the setting of the parameters and the resulting number of iterations. We did not optimize our implementation for speed. To avoid excessive processing time in running our experiments, for and , we did not attempt to run an algorithm to termination. If an algorithm exceeded 3 hours of processing time, we ensured it had converged to the desired accuracy, and then terminated it prematurely. If any of the runtimes of the 10 instances of a set fell into this category, the corresponding entries in the table are marked with an asterisk.
In order to calculate the “PI error” and “PII error”, for each experiment, we first calculated the value of each game, using the known closed form solution (see Subsection 1.1). Then, for Player I, the error is the smallest for which the final strategy output by the algorithm guarantees an expected payoff of at least ; for Player II the error is the smallest for which the final strategy guarantees an expected payoff of at most .
The convergence times and total times are written to the nearest integer and the errors are written in percentages to two decimal places.
We can see from Table 1 that HLP ran much quicker on average than FS (in the cases that they both terminated) and ran quicker than FS+ for and , but not for or . In practice, HLP converged significantly more quickly on average even than FS+ for all four values of .
Interestingly, by the time the algorithms stopped running, the actual error of the strategies was far lower than guaranteed by the theory, particularly for large . Indeed, for , for example, the average Player 1 error after the algorithms terminated was less than for each of the three algorithms, and for Player 2, was less than . Comparing the total run times for with the convergence times for , we see that in order to ensure an approximation ratio of , it was generally quicker to run the algorithm for .
Comparing the average errors across the three algorithms, we note that the average Player I error for HLP was always at least as great as that of both FS and FS+. An obvious explanation is that the former usually ran for fewer iterations. In separate experiments, we ran HLP for a number of iterations equal to the maximum number of iterations performed by FS and FS+; in this case we found that the Player I error for HLP was comparable to, and usually smaller than that of FS and FS+.
For Player II, HLP had the smallest average error for both and . For the larger values of , the average Player II error for HLP was comparable to that of FS+, and FS had the smallest average error, which again can be attributed to longer runtimes.
7 Conclusion
We have shown how we may use an algorithmic approach to find solutions or approximate solutions to search games, by exploiting oracles that find the best responses or approximate best responses. This has applications to the search games we have mentioned in Section 5, but we believe it may have further applications in other search games, and indeed in games studied in other fields of operations research such as security games.
Acknowledgements
L. Hellerstein was partially supported by NSF Award IIS-1217968.
Appendix
Equivalence of -Optimal Strategies and Relative -approximate Nash Equilibria for Zero-Sum Games
We show here that for small enough , our notion of -optimal strategies is in a natural sense equivalent to the notion of relative -approximate Nash equilibria for zero-sum games with positive payoffs, from Daskalakis (2013).
Consider a (possibly non-zero-sum) two-player game, and for mixed strategies and of Players I and II, denote the payoffs to the two players by and , respectively. A relative -approximate Nash equilibrium is a pair of strategies such that
[TABLE]
for all strategies of Player I and of Player II.
Now suppose the game is zero-sum, so that , and suppose some strategies define a relative -approximate Nash equilibrium. Let be the value of the game and let be any optimal strategies. Then for any Player II strategy ,
[TABLE]
for all . Inequalities (16) and (17) follow from the definition of a -approximate Nash equilibrium, inequality (18) follows from the optimality of and equation (19) follows from the definition of the value.
It can be similarly shown that for any Player I strategy . So the strategies are -optimal.
Now suppose the strategies are -optimal. Then for any Player I strategy ,
[TABLE]
for . Inequality (20) follows from the -optimality of and inequality (21) follows from the -optimality of .
Similarly, it can be shown that for any Player II strategy . So the strategies form a relative -approximate Nash equilibrium for small enough .
Proof of Proposition 1
Proof:
Consider the game whose payoff matrix has the value in the diagonal entries, and the value in all the off-diagonal entries. Then the value of the game is . Suppose Player II has a -optimal strategy with support . Then asymptotically, Player II can ensure the payoff does not exceed . Since is -optimal, it follows that
[TABLE]
But this implies , a contradiction.
Data for Numerical Experiments
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Alpern (2016) Alpern S (2016) Searching a Network Using Combinatorial Paths. Oper. Res. (in press).
- 2Alpern and Gal (2003) Alpern S, Gal S (2003). The Theory of Search Games and Rendezvous . Kluwer International Series in Operations Research and Management Science (Kluwer, Boston), 319.
- 3Alpern and Lidbetter (2013 a) Alpern S, Lidbetter T (2013 a) Searching a Variable Speed Network. Math. Oper. Res. 39(3):697–711.
- 4Alpern and Lidbetter (2013 b) Alpern S, Lidbetter T (2013 b) Mining Coal or Finding Terrorists: The Expanding Search Paradigm. Oper. Res. 61(2):265–279.
- 5Alpern and Lidbetter (2015) Alpern S, Lidbetter T (2015) Optimal Trade-Off Between Speed and Acuity When Searching for a Small Object. Oper. Res. 63(1):122–133.
- 6Angelopoulos et al. (2016) Angelopoulos S, Dürr C, Lidbetter T (2016) The Expanding Search Ratio of a Graph. In 33rd International Symposium on Theoretical Aspects of Computer Science (STACS) .
- 7Angelopoulos et al. (2017) Angelopoulos S, Dürr C, Lidbetter T (2017) The Expanding Search Ratio of a Graph. ar Xiv preprint ar Xiv:1602.06258.
- 8Arora et al. (2012) Arora S, Hazan E, Kale S (2012) The Multiplicative Weights Update Method: a Meta-Algorithm and Applications. Theory of Computing , 8(1):121–164.
