Markov perfect equilibria in non-stationary mean-field games
Deepanshu Vasal

TL;DR
This paper develops a backward recursive algorithm to compute Markov perfect equilibria in non-stationary mean-field games, accounting for dynamic population states and private types, with applications to cyber-physical security.
Contribution
It introduces a novel method for analyzing non-stationary mean-field games with private types, extending previous models that assumed stationary population dynamics.
Findings
Algorithm successfully computes MPE in non-stationary settings
Application to cyber-physical security demonstrates practical relevance
Numerical results illustrate strategic vaccination decisions
Abstract
In this paper, we consider both finite and infinite horizon discounted dynamic mean-field games where there is a large population of homogeneous players sequentially making strategic decisions and each player is affected by other players through an aggregate population state. Each player has a private type that only she observes. Such games have been studied in the literature under simplifying assumption that population state dynamics are stationary. In this paper, we consider non-stationary population state dynamics and present a novel backward recursive algorithm to compute Markov perfect equilibrium (MPE) that depend on both, a player's private type, and current (dynamic) population state. Using this algorithm, we study a security problem in cyberphysical system where infected nodes put negative externality on the system, and each node makes a decision to get vaccinated. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Opinion Dynamics and Social Influence · Economic theories and models
Markov perfect equilibria in non-stationary mean-field games
Deepanshu Vasal University of Texas, Austin, [email protected]
Abstract
We consider both finite and infinite horizon discounted dynamic mean-field games where there is a large population of homogeneous players sequentially making strategic decisions and each player is affected by other players through an aggregate population state. Each player has a private type that only she observes and all players commonly observe a mean-field population state which represents the empirical distribution of other players’ types. Such games have been studied in the literature under simplifying assumption that population state dynamics are stationary. In this paper, we consider non-stationary population state dynamics and present a novel backward recursive algorithm to compute Markov perfect equilibrium (MPE) that depend on both, a player’s private type, and current (dynamic) population state. Each step in this algorithm consists of solving a fixed-point equation. We provide conditions on model parameters for which there exists such an MPE. Using this algorithm, we study a security problem in cyber-physical system where infected nodes put negative externality on the system, and each node makes a decision to get vaccinated. We numerically compute MPE of the game.
I Introduction
With increasing amount of integration of technology in our society and with recent advancements in computation and algorithmic technologies, there is an unprecedented scale of interaction among people and devices. With technologies such as ride sharing platforms and social media apps completely integrated, and new technologies such as cyber physical systems, large scale renewable energy, electric vehicles, cryptocurrencies and smart grid on the horizon, there is paramount need to design and understand the behavior of such large scale interactions and their impact on our society. In this paper, we present a new methodology to analyze such interactions through mean-field dynamic games.
Dynamic games is a powerful tool to model such sequential strategic interaction among selfish players, introduced by Shapley in [1]. Discrete-time dynamic games with Markovian structure have been studied extensively to model many practical applications, in engineering as well as economics literature, such as dynamic auctions, security, markets, traffic routing, wireless systems, social learning, oligopolies– i.e. competition among firms [2, 3].
In dynamic games with perfect and symmetric information, subgame perfect equilibrium (SPE) is an appropriate equilibrium concept and there exists a backward recursive algorithm to find all the SPEs of these games (refer to [4, 5, 6] for a more elaborate discussion). Maskin and Tirole in [7] introduced the concept of Markov perfect equilibrium (MPE) where players’ strategies depend on a coarser Markovian state of the systems, instead of the whole history of the game which grows exponentially with time and thus becomes unwieldy. This is a refinement of the SPE. In general, there exists a backward recursive methodology to compute MPE of the game. Some prominent examples of the application of MPE include [8, 9, 10]. Ericson and Pakes in [8] model industry dynamics for firms’ entry, exit and investment participation, through a dynamic game with symmetric information, compute its MPE, and prove ergodicity of the equilibrium process. Bergemann and Välimäki in [9] study a learning process in a dynamic oligopoly with strategic sellers and a single buyer, allowing for price competition among sellers. They study MPE of the game and its convergence behavior. Acemoğlu and Robinson in [10] develop a theory of political transitions in a country by modeling it as a repeated game between the elites and the poor, and study its MPE.
However, when the number of players is large, computing MPE becomes intractable. To model the behavior of large population strategic interactions, mean-field games were introduced independently by Huang, Malhamé, and Caines [11], and Lasry and Lions [12]. In such games, there are large number of homogenous strategic players, where each player has infinitesimal affect on system dynamics and is affected by other players through a mean-field population state. There have been a number of applications such as economic growth, security in networks, oil production, volatility formation, population dynamics (see [13, 14, 15, 16, 17, 18, 19] and references therein).
To motivate our problem better, consider the following application. Consider a dynamic energy market, where in each period, a large number of different suppliers bid their estimated power outputs to an independent system operator (ISO) that formulates the market mechanism to determine the prices assessed to the different suppliers. Each supplier wants to maximize its overall return, which depends on its cost of production of energy, which is its private information, and the market-determined prices which depend on all the bids. Each bidder is thus affected not by other individual bidders, but an aggregated population state of everybody else.
In this paper, to model the scenarios described above we consider discounted infinite-horizon dynamic mean-field games where there is a large population of homogenous players each having a private type. Each player sequentially makes strategic decisions and is affected by other players through a mean-field population state. Each player has a private type that evolves through a controlled Markov process which only she observes and all players observe the current population state which is the distribution of other players’ types. In such games, the mean-field state evolves through McKean Vlasov forward equation given a policy of the players. And the equilibrium policy satisfies the Bellman backward equation, given the the mean-field states. Thus to compute equilibrium, one needs to solve the coupled backward and forward fixed-point equation in the mean-field and the equilibrium policy.
In [19], authors study stationary equilibria of a mean-field game where they make simplifying assumption on the model that the players are oblivious with respect to the mean-field statistics, and are playing in the limit such that the mean-field distribution has converged. This allows them to decouple the mean-field dynamics with that of the rest of the game.
In this paper, we consider a general model where players are cognizant i.e. they actively observe the current population state (which need not have converged) and act based on that population state and their own private state. We provide a novel backward recursive algorithm to compute non-stationary, signaling Markov perfect equilibrium (MPE) of that game. We also provide sufficient conditions for existence of MPE for both finite and infinite-horizon. In general, this algorithm could be used to relook the applications studied in mean-field games, to study equilibria with non-stationary mean-field statistics.
Using this framework, we consider malware spread problem in a cyber-physical system where nodes get infected by an independent random process and for each node, there is a higher risk of getting infected due to negative externality imposed by other infected players. At each time , each player privately observes its own state and publicly observes the population of infected nodes, based on which it has to make a decision to repair or not. Using our algorithm, we find equilibrium strategies of the players which are observed to be non-decreasing in the healthy population state.
Our algorithm is motivated by recent developments in the theory of dynamic games with asymmetric information in [20, 21, 22, 23, 24], where authors in these works have considered different models of such games and provided a sequential decomposition framework to compute Markovian perfect Bayesian equilibria of such games.
The paper is structured as follows. In Section II, we present model, notation and background. In section III, we present our main results where we present algorithm to compute MPE for both finite and infinite horizon game, and also present existence results. We present a numerical example in Section V. We conclude in Section VI.
I-A Notation
We use uppercase letters for random variables and lowercase for their realizations. For any variable, subscripts represent time indices and superscripts represent player identities. We use notation to represent all players other than player i.e. . We use notation to represent the vector when or an empty vector if . We use to mean . We remove superscripts or subscripts if we want to represent the whole vector, for example represents . We denote the indicator function of any set by . For any finite set , represents space of probability measures on and represents its cardinality. We denote by (or ) the probability measure generated by (or expectation with respect to) strategy profile . We denote the set of real numbers by . For a probabilistic strategy profile of players where probability of action conditioned on is given by , we use the short hand notation to represent . All equalities and inequalities involving random variables are to be interpreted in a.s. sense.
II Model and Background
We consider both finite and infinite-horizon discrete-time large population sequential game as follows. There are homogenous players, where tends to . We denote the set of homogenous players by and with some abuse of notation, set of time by [T] for both finite and infinite time horizon. In each period , player observes a private type and a common observation , takes action , and receives a reward which is a function of its current type , action and the common observation . The common observation be the fraction of population having type at time i.e.
[TABLE]
where . Player ’s type evolve as a controlled Markov process,
[TABLE]
The random variables are assumed to be mutually independent across players and across time. We also write the above update of through a kernel, .
In any period , player observes . She takes action according to a behavioral strategy , where . We denote the space of such strategies as . This implies . We denote to be the space of population states till time . We denote to be set of observed histories of player .
For finite time-horizon game, , each player wants to maximize its total expected discounted reward over a time horizon , discounted by discount factor ,
[TABLE]
For the infinite time-horizon game, , each player wants to maximize its total expected discounted reward over an infinite-time horizon discounted by discount factor ,
[TABLE]
II-A Solution concept: MPE
The Nash equilibrium (NE) of is defined as strategies that satisfy, for all ,
[TABLE]
For sequential games, however, a more appropriate equilibrium concept is Markov perfect equilibrium (MPE) [7], which we use in this paper. We note that an MPE is also a Nash equilibrium of the game, although not every Nash equilibrium is an MPE. An MPE satisfies sequential rationality such that for , ,
[TABLE]
NE and MPE for are defined in a similar way where summation in the above equations is taken such that is replaced by .
III A methodology to compute MPE
In this section, we will provide a backward recursive methodology to compute MPE for both and . We will consider Markovian equilibrium strategies of player which depend on the common information at time , , and on its current type .111Note however, that the unilateral deviations of the player are considered in the space of all strategies. Equivalently, player takes action of the form . Similar to the common agent approach in [25], an alternate and equivalent way of defining the strategies of the players is as follows. We first generate partial function as a function of through an equilibrium generating function such that . Then action is generated by applying this prescription function on player ’s current private information , i.e. . Thus .
We are only interested in symmetric equilibria of such games such that i.e. there is no dependence of on the strategies of the players.
For a given symmetric prescription function , the statistical mean-field evolves according to the discrete-time McKean Vlasov equation, :
[TABLE]
which implies
[TABLE]
III-A Backward recursive algorithm for
In this subsection, we will provide a methodology to generate symmetric MPE of of the form described above. We define an equilibrium generating function , where , where for each , we generate . In addition, we generate a reward-to-go function , where . These quantities are generated through a fixed-point equation as follows.
Initialize ,
[TABLE]
- 2.
For , let be generated as follows. Set , where is the solution of the following fixed-point equation222We discuss the existence of solution of this fixed-point equation in Section IV, ,
[TABLE]
where expectation in (10) is with respect to random variable through the measure .We note that the solution of (10), , appears both on the left of (10) and on the right side in the update of , and is thus unlike the fixed-point equation found in Bayesian Nash equilibrium.
Furthermore, using the quantity found above, define
[TABLE]
Then, an equilibrium strategy is defined as
[TABLE]
where .
In the following theorem, we show that the strategy thus constructed is an MPE of the game.
Theorem 1
A strategy constructed from the above algorithm is an MPE of the game i.e. ,
[TABLE]
Proof 1
Please see Appendix A.
III-B Converse
In the following, we show that every Markovian mean field equilibria can be found using the above backward recursion.
Theorem 2** (Converse)**
Let be a Markovian MPE of the mean field game. Then there exists an equilibrium generating function that satisfies (10) in backward recursion such that is defined using .
Proof 2
Please see Appendix C.
III-C Backward recursive algorithm for
In this section, we consider the infinite-horizon problem , for which we assume the reward function to be absolutely bounded.
We define an equilibrium generating function , where for each , we generate . In addition, we generate a reward-to-go function . These quantities are generated through a fixed-point equation as follows.
For all set . Then are solution of the following fixed-point equation333We discuss the existence of solution of this fixed-point equation in Section IV, ,
[TABLE]
where expectation in (14) is with respect to random variable through the measure .
Then an equilibrium strategy is defined as
[TABLE]
where .
The following theorem shows that the strategy thus constructed is an MPE of the game.
Theorem 3
A strategy constructed from the above algorithm is an MPE of the game i.e. ,
[TABLE]
Proof 3
Please see Appendix D.
III-D Converse
In the following, we show that every Markovian mean field equilibria can be found using the above backward recursion.
Theorem 4** (Converse)**
Let be a Markovian MPE of the mean field game. Then there exists an equilibrium generating function that satisfies (10) in backward recursion such that is defined using .
Proof 4
Please see Appendix F.
IV Existence
In this section, we discuss sufficient conditions for the existence of a solution of the fixed-point equations (10) and (14).
Assumption 1** (A1)**
Let the reward function and the state update kernel be continuous functions in .
We note that the above equation implies that the reward function is bounded.
Theorem 5
Under assumption (A1), there exists solution of the fixed-point equations (10) and (14) for every .
Proof 5
Under the assumption (A1), it has been shown in [26] that there exists Markovian MPE of both the finite and infinite horizon game. Theorem 2 and Theorem 4 show that all Markovian MPE can be found using backward recursion for the finite and infinite horizon problems respectively. This proves that under (A1), there exists a solution of (10) and (14) for every .
V Numerical Example: Cyber physical security
We consider a security problem in a cyber physical network with positive externalities. It is discretized version of the malware problem presented in **[16, 17, 18, 27]**. Some other applications of this model include flu vaccination, entry and exit of firms, investment, network effects. In this model, suppose there are large number of cyber physical nodes where each node has a private state where represent ‘healthy’ state and is the infected state. Each node can take action , where implies “do nothing” and implies repair. The dynamics are given by
[TABLE]
where is a binary valued random variable with , which represents the probability of a node getting infected. Thus if a node doesn’t do anything, it could get infected with certain probability, however, if it takes repair action, it comes back to the healthy state. Each node gets a reward
[TABLE]
where is the mean-field population state being 1 at time , is the cost of repair and represents the risk of being infected. We pose it as an infinite horizon discounted dynamic game. We consider parameters for numerical results presented in Figures 1-4.
VI Conclusion
In this paper, we consider both finite and infinite horizon, large population dynamic game where each player is affected by others through a mean-field population state. We present a novel backward recursive algorithm to compute non-stationary, signaling Markov perfect equilibria (MPE) for such games, where each player’s strategy depends on its current private type and current mean-field population state. The non-triviality in the problem is that the update of population state is coupled to the strategies of the game, and is managed in the algorithm through unique construction of the fixed-point equations (10),(14). We proved the existence of such equilibrium. Using this algorithm, we considered a malware propagation problem where we numerically computed equilibrium strategies of the players. In general, this algorithm could instrumental in studying non-stationary equilibria in a number of applications such as financial markets, social learning, renewable energy.
Acknowledgments
The author would like to acknowledge the support of Simons Grant #26-7523-99 and Department of Defense grant #W911NF1510225. The author thanks Francois Baccelli and Sriram Vishwanath for encouragement and support.
Appendix A
Proof 6
We prove (13) using induction and the results in Lemma 1, and 2 proved in Appendix B.
[TABLE]
where (21a) follows from Lemma 2 and (21b) follows from Lemma 1 in Appendix B.
Let the induction hypothesis be that for , ,
[TABLE]
[TABLE]
where (23a) follows from Lemma 2, (23b) follows from Lemma 1, (23c) follows from Lemma 2, (23d) follows from induction hypothesis in (22b) and (23e) follows since the random variables involved in the right conditional expectation do not depend on strategies .
Appendix B
Lemma 1
**
[TABLE]
Proof 7
We prove this lemma by contradiction.
Suppose the claim is not true for . This implies such that
[TABLE]
We will show that this leads to a contradiction. Construct
[TABLE]
Then for , we have
[TABLE]
Lemma 2
,
[TABLE]
Proof 8
[TABLE]
where (29b) follows from the definition of in (11). Suppose the claim is true for , i.e.,
[TABLE]
Then , we have
[TABLE]
(31c) follows from the induction hypothesis in (30), (31d) follows because the random variables involved in expectation, do not depend on and (31e) follows from the definition of in (11).
Appendix C
Proof 9
We prove this by contradiction. Suppose for any equilibrium generating function that generates an MPE , there exists such that (10) is not satisfied for i.e. for ,
[TABLE]
Let be the first instance in the backward recursion when this happens. This implies such that
[TABLE]
This implies for ,
[TABLE]
where (59) follows from the definitions of and Lemma 2, (60) follows from (56) and the definition of , (61) follows from Lemma 1. However, this leads to a contradiction since is an MPE of the game.
Appendix D
We divide the proof into two parts: first we show that the value function is at least as big as any reward-to-go function; secondly we show that under the strategy , reward-to-go is . Note that .
Part 1
For any , define the following reward-to-go functions
[TABLE]
Since are finite sets the reward is absolutely bounded, the reward-to-go is finite .
For any , ,
[TABLE]
Combining results from Lemmas 4 and 5 in Appendix D, the term in the first bracket in RHS of (41) is non-negative. Using (40), the term in the second bracket is
[TABLE]
The summation in the expression above is bounded by a convergent geometric series. Also, is bounded. Hence the above quantity can be made arbitrarily small by choosing appropriately large. Since the LHS of (41) does not depend on , which implies,
[TABLE]
Part 2
Since the strategy the equilibrium strategy generated in (15) is such that depends on only through and , the reward-to-go , at strategy , can be written (with abuse of notation) as
[TABLE]
For any ,
[TABLE]
Repeated application of the above for the first time periods gives
[TABLE]
Taking differences results in
[TABLE]
Taking absolute value of both sides then using Jensen’s inequality for and finally taking supremum over reduces to
[TABLE]
Now using the fact that are bounded and that we can choose arbitrarily large, we get .
Appendix E
In this section, we present three lemmas. Lemma 3 is intermediate technical results needed in the proof of Lemma 4. Then the results in Lemma 4 and 5 are used in Appendix C for the proof of Theorem 3. The proof for Lemma 3 below isn’t stated as it analogous to the proof of Lemma 1 from Appendix B, used in the proof of Theorem 1 (the only difference being a non-zero terminal reward in the finite-horizon model).
Define the reward-to-go for any agent and strategy as
[TABLE]
Here agent ’s strategy is whereas all other agents use strategy defined above. Since are assumed to be finite and absolutely bounded, the reward-to-go is finite . In the following, any quantity with a in the superscript refers the finite horizon model with terminal reward .
Lemma 3
For any , , and ,
[TABLE]
*The result below shows that the value function from the backwards recursive algorithm is higher than any reward-to-go. *
Lemma 4
For any , , and ,
[TABLE]
Proof 10
We use backward induction for this. At time , using the maximization property from (10) (modified with terminal reward ),
[TABLE]
Here the second inequality follows from (10) and (11) and the final equality is by definition in (49).
Assume that the result holds for all , then at time we have
[TABLE]
Here the first inequality follows from Lemma 3, the second inequality from the induction hypothesis, the third equality follows since the random variables on the right hand side do not depend on , and the final equality by definition (49).
The following result highlights the similarities between the fixed-point equation in infinite-horizon and the backwards recursion in the finite-horizon.
Lemma 5
Consider the finite horizon game with . Then , , satisfies the backwards recursive construction stated above (adapted from (10) and (11)).
Proof 11
Use backward induction for this. Consider the finite horizon algorithm at time , noting that ,
[TABLE]
Comparing the above set of equations with (14), we can see that the pair arising out of (14) satisfies the above. Now assume that for all . At time , in the finite horizon construction from (10), (11), substituting in place of from the induction hypothesis, we get the same set of equations as (54). Thus satisfies it.
Appendix F
Proof 12
We prove this by contradiction. Suppose for the equilibrium generating function that generates MPE , there exists such that (10) is not satisfied for i.e. for ,
[TABLE]
Let be the first instance in the backward recursion when this happens. This implies such that
[TABLE]
This implies for ,
[TABLE]
where (59) follows from the definitions of and Appendix D, (60) follows from (56) and the definition of , (61) follows from Appendix D. However, this leads to a contradiction since is an MPE of the game.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] L. S. Shapley, “Stochastic games,” Proceedings of the national academy of sciences , vol. 39, no. 10, pp. 1095–1100, 1953.
- 2[2] T. Baş ar and G. Olsder, Dynamic Noncooperative Game Theory, 2nd Edition . Society for Industrial and Applied Mathematics, 1998.
- 3[3] J. Filar and K. Vrieze, Competitive Markov decision processes . Springer Science & Business Media, 2012.
- 4[4] M. J. Osborne and A. Rubinstein, A Course in Game Theory , ser. MIT Press Books. The MIT Press, 1994, vol. 1.
- 5[5] D. Fudenberg and J. Tirole, Game Theory . Cambridge, MA: MIT Press, 1991.
- 6[6] G. J. Mailath and L. Samuelson, Repeated games and reputations: long-run relationships . Oxford university press, 2006.
- 7[7] E. Maskin and J. Tirole, “Markov perfect equilibrium: I. observable actions,” Journal of Economic Theory , vol. 100, no. 2, pp. 191–219, 2001.
- 8[8] R. Ericson and A. Pakes, “Markov-perfect industry dynamics: A framework for empirical work,” The Review of Economic Studies , vol. 62, no. 1, pp. 53–82, 1995.
