Stationary Equilibria of Mean Field Games with Finite State and Action Space
Berenice Anne Neumann

TL;DR
This paper studies stationary equilibria in finite state and action mean field games with Markovian dynamics, providing existence results, characterization methods, and computational techniques for these equilibria.
Contribution
It introduces a framework for finite state and action mean field games with Markov dynamics, establishing existence and characterization of stationary equilibria.
Findings
Existence of stationary mean field equilibria under mild conditions.
Characterization of equilibria as fixed points of a transition-based map.
Demonstration of techniques on two example models.
Abstract
Mean field games formalize dynamic games with a continuum of players and explicit interaction where the players can have heterogeneous states. As they additionally yield approximate equilibria of corresponding -player games, they are of great interest for socio-economic applications. However, most techniques used for mean field games rely on assumptions that imply that for each population distribution there is a unique optimizer of the Hamiltonian. For finite action spaces, this will only hold for trivial models. Thus, the techniques used so far are not applicable. We propose a model with finite state and action space, where the dynamics are given by a time-inhomogeneous Markov chain that might depend on the current population distribution. We show existence of stationary mean field equilibria in mixed strategies under mild assumptions and propose techniques to compute all these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\setkomafont
sectioning
Stationary Equilibria of Mean Field Games with Finite State and Action Space
Berenice Anne Neumann Universität Hamburg, Department of Mathematics, STSP, Bundesstr. 55 (Geomatikum), 20146 Hamburg, Germany, E-mail address: [email protected]
Abstract
Mean field games formalize dynamic games with a continuum of players and explicit interaction where the players can have heterogeneous states. As they additionally yield approximate equilibria of corresponding -player games, they are of great interest for socio-economic applications. However, most techniques used for mean field games rely on assumptions that imply that for each population distribution there is a unique optimizer of the Hamiltonian. For finite action spaces, this will only hold for trivial models. Thus, the techniques used so far are not applicable. We propose a model with finite state and action space, where the dynamics are given by a time-inhomogeneous Markov chain that might depend on the current population distribution. We show existence of stationary mean field equilibria in mixed strategies under mild assumptions and propose techniques to compute all these equilibria. More precisely, our results allow –- given that the generators are irreducible – to characterize the set of stationary mean field equilibria as the set of all fixed points of a map completely characterized by the transition rates and rewards for deterministic strategies. Additionally, we propose several partial results for the case of non-irreducible generators and we demonstrate the presented techniques on two examples.
1 Introduction
Mean field games have been introduced independently by Lasry and Lions (2007) and Huang et al. (2006) in order to provide a framework for dynamic stochastic games in continuous time with a continuum of players, whose equilibria furthermore serve as approximate Nash equilibria for corresponding -player games. The main feature of these games is that any player does not observe the state and action of each other player individually, but only at an aggregated level through the empirical distribution of these characteristics.
From an economic perspective these games are of particular interest as they yield a formal way to describe games with a continuum of rational players that accounts for explicit interaction (in contrast to the classical assumption in general equilibrium theory that “prices mediate all social interaction” (Guéant et al., 2011)) as well as heterogeneity of states (in contrast to representative agent models (see Gomes et al. (2015, p.209))). Therefore a wide range of economic models relying on mean field games emerged, which includes growth models, the production of an exhaustible resource by a continuum of producers as well as opinion dynamics (see Gomes et al. (2015), Guéant et al. (2011), Caines et al. (2017) and the references therein).
In classical mean field games models each individual player’s dynamics is given by a diffusion process whose drift and volatility depend on time, the current state and the current action of the individual player as well as the current distribution of all players. Each player individually solves an optimal control problem given these dynamics, where the costs also depend on the current state and action of the individual player as well as distribution of all players. A mean field equilibrium is then given by a flow of population distributions such that there is an optimal strategy for the individual control problem given and the distribution of the individual player given this strategy is in turn . This is a natural analogue of a Nash equilibrium of games with players: Given that all players play the strategy , no player wants to deviate from playing , as the population distribution is and is a best response to it.
To solve this type of mean field games one then sets up several assumptions, which usually include the assumption that there is a unique optimizer of the Hamiltonian. Then one can show that finding a mean field equilibrium boils down to solving a system of a Hamilton-Jacobi-Bellman equation coupled with a Fokker-Planck-equation – or if one prefers the probabilistic approach – to a Forward-Backward Stochastic Differential Equation (FBSDE). The forward-backward structure of these differential equations is non-standard and a wide range of the literature regarding mean field games covers the analysis of these equations. For more details consider Bensoussan et al. (2013) and Carmona and Delarue (2018a, b) as well as the references therein. We remark, that in Lacker (2015) existence of mean field equilibria in mixed strategies is proven under continuity, measureability and boundedness conditions, which, in particular, do not include the assumption that a unique optimizer of the Hamiltonian exists. A similar existence result for mean field games with controlled jump-diffusion dynamics can be found in Benazzoli et al. (2018).
In Gomes et al. (2010, 2013) mean field games with finite state space have been introduced and thereafter several other (more general) mean field game models with finite state spaces haven been considered. We give an overview at the end of the introduction. Also applications of mean field games with finite state space have been considered. However, several applications consider finite action spaces (for example Kolokoltsov and Bensoussan (2016), Kolokoltsov and Malafeyev (2017), Guéant (2009a) and Besancenot and Dogguy (2015)) and for this type of models the literature covers only an existence result in mixed strategies (also called relaxed strategies) (see Cecchin and Fischer (2018)). Indeed, for non-trivial models with finite action spaces there are always population distributions for which more than one optimizer of the Hamiltonian exists, in which case most of the techniques presented in the literature so far are not applicable. For this reason for the previously mentioned examples the authors develop their own tools to solve their particular model: Kolokoltsov and Bensoussan(2016) and Kolokoltsov and Malafeyev(2017) only analyse stationary equilibria in deterministic strategies, Gueant(2009a) and Besancenot and Dogguy(2015) set up a dynamics equation only after analysing optimal decisions given a certain population distribution. They then also focus on stationary equilibria, as well as the dynamic behaviour close to these stationary equilibria and the effect of shocks.
In this paper, we will present general tools to compute stationary equilibria of mean field games with finite state and action space: We introduce the notion of stationary mean field equilibria into the mean field game model with finite state and finite action space presented in Doncel et al. (2016a). We remark that we consider a stationary equilibrium in an infinite horizon mean field game where the expected discounted reward () is maximized and not, as in many other settings, a stationary (ergodic) equilibrium, where the expected average reward () is maximized. Since the analytic formulation of Doncel et al. (2016a) is not suitable for this task, we formulate the model in a probabilistic way: The individual dynamics of a player is given by a time-inhomogeneous continuous time Markov chain with the generator depending on the current action and the current distribution of the population; the costs depend on the current state as well as the current action of the individual player and the current population distribution. As in Doncel et al. (2016a) in the case of dynamic equilibria, we only prove existence of stationary mean field equilibria in mixed strategies.
We focus on stationary equilibria for several reasons: First, searching for stationary equilibria reduces the complexity of the considered problem. More precisely, we can utilize the standard theory on Markov decision process with stationary transition rates and rewards and we can considered a fixed point problem in instead of a fixed point problem in some function space. Second, the main focus of the economic models studied so far also lies in stationary equilibria. Third and linked to the last reason, one can often establish some kind of convergence/adjustment process towards stationary equilibria. More precisely, Gomes et al. (2013) prove that for mean field games with finite state space under certain assumptions every dynamic mean field equilibrium converges to the stationary (ergodic) equilibrium. In the case of continuous state spaces Cardaliaguet et al. (2012, 2013) proves that under certain assumptions every dynamic mean field equilibrium converges to the stationary (ergodic) equilibrium and Guéant (2009b) describes a cognitive process that converges if it is started close to stationary equilibrium indeed to the stationary (ergodic) equilibrium. Moreover, in the examples presented by Guéant (2009a) and Besancenot and Dogguy (2015) it is shown that there is local convergence of the trajectories of the dynamic equilibrium to the stationary (discounted reward) equilibrium.
Relying on our probabilistic formulation of the model, we will show existence of stationary equilibria under the same conditions as in the dynamic case. This is compared to Gomes et al. (2013) a surprising result as they need several additional assumptions compared to the dynamic existence result to establish existence of stationary equilibria.
Thereafter, we derive tools to compute all stationary equilibria (including those where the equilibrium strategy randomizes over different actions). As in standard mean field game models, we first have to solve an optimal control problem given a fixed flow of population distribution and second a fixed point problem, namely searching for flows of population distributions such that is the distribution of an individual player playing optimal given . We will see that the first problem is equivalent to solving a standard Markov decision process with expected discounted reward criterion and we will show that the set of all randomized optimal stationary strategies is the convex hull of all deterministic optimal stationary strategies. For general dynamics, we cannot simplify the fixed point problem directly but we will provide a generally applicable reformulation of the necessary and sufficient balance equations inspired by the cut criterion for standard Markov chains, which often proves helpful in examples. Assuming irreducibility of the generators of the individual dynamics given any population distribution and any strategy, we can obtain all distributions of stationary equilibria as the fixed points of a set-valued map with convex values, which can be completely characterized by the transition rates and rewards given deterministic strategies.
1.1 Related Literature
As indicated previously we would like to sketch briefly the research regarding finite state mean field games: The starting point regarding the study of finite state mean field games were the models of Gomes et al. (2010) (in discrete time), Gomes et al. (2013) and Guéant (2011, 2015). We will focus on the continuous time models here: In both models, fully controllable transition rates are considered (Guéant (2011, 2015) additionally assumes that the players might not reach all other states from a given state) such that the player’s dynamics is given by a time-inhomogeneous Markov chain. The costs consist of instantaneous costs depending on the current state and action of the individual player as well as the current distribution of all players together with a terminal cost depending on the current state of the individual player and the current distribution of all players. In both models, assumptions are set up such that there is always a unique optimizer of the Hamiltonian. For both models then a system of forward-backward ODEs is presented, the solution of which yields a mean field equilibrium. Moreover, existence and uniqueness of solutions to these equations is discussed.
Guéant (2011) discusses further sufficient conditions for the existence of mean field equilibria (including a discrete state master equation); Gomes et al. (2013) studies stationary equilibria and establishes for contractive mean field games that a trend to equilibrium exists. Gomes et al. (2013) furthermore study an -player game and the convergence of this game to the mean field game model. Additionally, as in the diffusion-based models, the class of potential mean field games is introduced, which has a simpler cost structure and, thus, allows for deeper results. Namely, the costs split in two additive terms, one term depending on the current state and the current population distribution, which is furthermore the gradient of a convex function, as well as one term depending on the current state and current action.
Several other authors discuss similar questions in models with more general individual dynamics, in particular the transition rates might not be fully controllable, but again assumptions were set which imply that there is a unique optimizer of the Hamiltonian: Carmona and Delarue (2018a, Chapter 7.2) provide a discussion of finite state mean field games, which is closely related to their exposition of standard mean field games models with continuous state space. They consider models in which transition rates depend on the current state, the current action and the population distribution and discuss existence and uniqueness results as well as a master equation. Basna et al. (2014) discusses mean field games were the dynamics are given by a non-linear Markov process with a generator that might additionally depend on the distribution of all other players and show under several assumptions that mean field equilibria are -Nash equilibria for the corresponding -player games.
Cecchin and Fischer (2018) present a mean field game with the individual dynamics given by stochastic differential equation driven by a stationary Poisson random measure, where again a dependence on the current population distribution is possible. They discuss existence (also in mixed strategies under mild continuity and boundedness assumptions) and uniqueness of mean field equilibria and furthermore show that mean field equilibria in open-loop and feedback strategies are -Nash equilibria for the corresponding -player game. Carmona and Wang (2018) provide a mean field game model with the dynamics given by a continuous time Markov chain with a generator which might depend on the current distributions of the states as well as the actions of all players. Using the semimartingale representation of continuous time Markov chains they again consider existence, uniqueness and the question when a mean field equilibrium is an approximate Nash equilibrium for the corresponding -player game.
The model of Doncel et al. (2016a) (which we consider in this paper) does not require a unique optimizer of the Hamiltonian, but instead it is directly assumed that the action space is finite. The dynamics of each individual player are given by a differential equation specifying the transition rates, which in turn depend on the individual’s state and action as well as on the current population distribution. Existence of dynamic mean field equilibria is shown and a discrete time -player game is considered, for which it is shown that mean field equilibria are -Nash equilibria. Furthermore, the question of convergence is considered. More precisely, given a sequence of strategies which are equilibria in the -player game is there some sub-sequence converging to a mean field equilibrium? The answer to this question is positive if one considers local strategies (which only depend on the current state and time), but negative if one considers Markov strategies (which also depend on the current distribution of all players). The intuitive reason for this is that the “tit-for-tat”-principle cannot be applied in the limit (see Doncel et al. (2016b) for more details).
1.2 Organization of the Paper
The structure of the paper is as follows: Section 2 introduces the model in a probabilistic formulation. Section 3 discusses the individual control problem. In Section 4 we show that for all models fitting into our framework a stationary mean field equilibrium in mixed strategies exists. Section 5 first discusses the generally applicable cut criterion for our setting, then - given irreducibility of the generator - we propose a characterization of the set of all distributions of stationary mean field equilibria as the set of all fixed points of a suitable set-valued map which is completely determined by the dynamics and rewards given the deterministic strategies. Section 6 concludes the paper by showing how the presented tools can be applied to find stationary mean field equilibria in two examples.
2 The Model
Let () be the set of possible states of each player and let be the set of possible actions. With we denote the probability simplex over and similarly for . A (mixed) strategy is a measurable function , with the interpretation that is the probability that at time and in state the player chooses action . We say that a strategy is deterministic if it satisfies for all and for all that there is an such that and for all . Throughout the presentation we often use the following equivalent representation, which is to represent a deterministic strategy as a function with the interpretation that states that at time in state action is chosen. With we denote the set of all (mixed) strategies and with the set of all deterministic strategies.
The individual dynamics of each player given a flow of population distributions and a strategy are given as a Markov process on a given probability space with given initial distribution and infinitesimal generator given by the -matrix
[TABLE]
where for all and the matrices are conservative generators, that is for all with and for all .
Given the initial condition , the goal of each player is to maximize his expected reward, which is given by
[TABLE]
where is the probability that the individual player is in state at time , is a real-valued function and is the discount factor. That is, for a fixed population distribution we face a Markov decision process with expected discounted reward criterion and time-inhomogeneous reward functions and transition rates.
We will work under the following mild continuity assumption, which will ensure that there is indeed a Markov process with generator (see Guo and Hernández-Lerma (2009, Appendix B+C) for details):
Assumption A1**.**
For all and all the function mapping from to is Lipschitz-continuous in . For all and all the function mapping from to is continuous in .
With these preparations, we can introduce the concept of dynamic mean field equilibria:
Definition 2.1**.**
Given an initial distribution , a mean field equilibrium is a flow of population distributions with and a strategy such that
- •
the distribution of the process at time is given by
- •
for all .
As in standard game theory, our concept of mean field equilibrium captures the intuitive idea that no player wants to deviate: Given that all players play according to strategy the population’s distribution will be . If an individual player now evaluates whether he wants to deviate from playing he asks whether there is a strategy that yields a higher payoff given . Due to the second condition this is not possible. Therefore, we indeed face an equilibrium in the standard economic sense.
Remark 2.1*.*
Using the Kolmogorov forward equation (Guo and Hernández-Lerma, 2009, Proposition C.4), we see that the first condition implies the analytic condition used in Doncel et al. (2016a) to characterize mean field equilibria, which states that is solution of
[TABLE]
with initial condition .
Remark 2.2*.*
The definition of strategies we adopt here is unusual in classical game theory. However, in the setting of mean field games it is sensible. Indeed, given the initial state of the system and the strategy of the other players the behaviour of the system is fully determined. Thus, it suffices for the individual agent to know the initial global state (see Caines et al. (2017)).
In order to define stationary mean field equilibria, we first introduce the notion of stationary strategies: A stationary strategy is a map such that for all . Again we denote by the set of all stationary strategies and by the set of all deterministic stationary strategies.
Definition 2.2**.**
A stationary mean field equilibrium is given by a stationary strategy and a vector such that
- •
the law of at any point in time is given by
- •
for any initial distribution we have for all .
This notion is a sensible formalization of stationary equilibria: Given the strategy the population’s distribution will be for all time points. An individual agent at a given time point can be in any state, however, if he evaluates whether he wants to deviate from playing , the second condition ensures that this is not beneficial for him. Thus, he has no incentive to deviate from the equilibrium strategy , which means that the population will indeed remain in the stationary equilibrium regime of playing .
Remark 2.3*.*
We remark that the matrix does not depend on in this context, therefore, we write . Using this, we obtain that the first condition is equivalent to
[TABLE]
Moreover, we remark that the second condition requires to be optimal among all strategies, not only those that are stationary.
Remark 2.4*.*
In contrast to the standard models where the assumptions usually imply that a unique optimal best response exists, the mean field equilibria we consider are not fully specified by the distribution, as it might happen that several actions are simultaneously optimal and induce the same distribution. However, the dynamic mean field equilibrium is fully specified by describing the equilibrium strategy, as one can show using standard techniques (Walter, 1998, Theorem 10.XX) that there is at most one solution to the differential equation. For the stationary mean field equilibrium, this again does not hold true, as it might happen that given a strategy there are multiple stationary distributions. For this reason, we define mean field equilibria always as pairs of the equilibrium distribution and the equilibrium strategy.
Remark 2.5*.*
We remark that for non-trivial models (in the sense that there is not one action that maximizes the Hamiltonian for every population distribution) we always obtain population distributions at which several actions maximize the Hamiltonian: Since and are continuous, also the Hamiltonian is continuous in . Therefore, if we fix the costate variables, the sets of population distributions in which a particular action is a maximizer of the Hamiltonian are closed. Since the action space is finite and the set of all population distribution vectors is connected, we obtain that if there is more than one action that maximizes the Hamiltonian for some population distribution, then the set of population distributions where several actions simultaneously maximize the Hamiltonian is non-empty. This implies that for the case of finite action spaces the assumption that a unique maximizer of the Hamiltonian exists is violated in all interesting cases. Thus, new methods for the analysis of these models are necessary.
3 The Individual Control Problem
This section is devoted to the analysis of the individual control problem: We propose a simple approach to determine which strategies are optimal for a given population distribution and show that optimal stationary strategies are convex combinations of particular deterministic stationary strategies.
We start by showing that given a stationary population distribution the individual player’s control problem is equivalent to a continuous time Markov decision process with expected discounted reward criterion (see Guo and Hernández-Lerma (2009) for a definition and general results).
Lemma 3.1**.**
Let be a population distribution. A Markovian randomized strategy is optimal in our model given , i.e. achieves for all the maximum value of among all strategies , if and only if it is discounted reward optimal for the continuous time Markov decision process with transition rates , rewards and discount factor . In particular, there is a stationary strategy that satisfies for all and all .
Proof.
Assumption A1 ensures that the value function is finite for every population distribution function and every individual strategy since is, as a continuous function on a compact space, uniformly bounded. Thus, we can rewrite the value function by using the representation :
[TABLE]
where is the expected discounted reward of a continuous time Markov decision process with expected discounted reward criterion with the above-mentioned rates and rewards when the initial state is . Since a strategy is optimal for the continuous time Markov decision process if it maximizes all simultaneously, we obtain the desired equivalence. The last statement directly follows for the classical theory for Markov decision process, see for example Guo and Hernández-Lerma (2009, Chapter 4). ∎
Now, we show that the set of all optimal stationary strategies is the convex hull of all deterministic optimal stationary strategies.
Theorem 3.2**.**
Let . Write
[TABLE]
with
[TABLE]
where is the value function of the associated Markov decision process. Then is non-empty. Furthermore, a stationary strategy is optimal for our model given if and only if it is a convex combination of strategies from .
Proof.
By the previous lemma we know that the individual’s control problem is equivalent to the continuous time Markov decision process with expected discounted reward criterion with discount factor , transition rates and reward function . Since we consider a finite state space, we obtain, using the uniformization procedure (Guo and Hernández-Lerma (2009, Remark 6.1), Kakumanu (1977)), an equivalent discrete time Markov decision process with expected discounted reward criterion. Its discount factor is . Writing , the transition rates are given by
[TABLE]
and the reward functions are given by
[TABLE]
Simple computations yield that
[TABLE]
with being the value function of the discrete time Markov decision process, which is indeed equal to the value function of the continuous time Markov decision process (see Kakumanu (1977) for details; note however, that he does not adjust the rewards, which yields the proportional factor for the value functions in his setting).
Now we can prove the statement for discrete time Markov decision processes relying on the rich theory developed for these problems (see Puterman (1994)): We first note that the set in non-empty since is finite. Moreover, by Puterman (1994, Corollary 6.2.8), any deterministic strategy in our set is indeed optimal. Enumerate and let be a convex combination of strategies in , that is
[TABLE]
By Puterman (1994, Theorem 6.1.1) the reward function given a certain stationary strategy can be written as the unique solution of with
[TABLE]
Noting that and are linear in , we can rewrite the policy evaluation equation as
[TABLE]
Since for all the deterministic stationary strategy is optimal it follows that is the unique solution of the policy evaluation equation:
[TABLE]
By Puterman (1994, Theorem 6.2.2 and Theorem 6.2.5), which states that in our setting the unique solution of the optimality equation is , and by Puterman (1994, Theorem 6.2.6), which states that a strategy is optimal if and only if its value function is a solution of the optimality equation, we obtain that the strategy is optimal.
To show the converse implication we assume that is not a convex combination of deterministic strategies from . One easily sees that there is still a representation of as a convex combination of arbitrary deterministic strategies by considering
[TABLE]
Moreover, any convex combination of deterministic strategies representing the strategy has a summand with positive weight. This means that for the strategy there is a state and an action such that . This implies that also the stationary strategy chooses that action in state with positive probability, that is . This means for the -th component of the strategy’s expected discounted reward:
[TABLE]
Note that the second lines follows from and the third line follows from the fact that is chosen with positive probability .
As now and by the finiteness of and an optimal strategy achieving value exists, it follows that is not optimal. ∎
We note that we reduced the problem of determining which of the infinitely many strategies are indeed optimal for a given to the problem of determining which of the finitely many deterministic strategies are optimal.
This theorem yields a basic guideline for finding all mean field equilibria: For each point determine the set of all optimal strategies. Since there are only finitely many deterministic stationary strategies, this yields to a partition of . Let us write for the set of all such that if and only if for all . Thus, we then have to search in each of the sets for fixed points of the dynamics given all those deterministic stationary strategies satisfying for all .
Furthermore, the results allows us to prove that a game that is not trivial in the sense that is there are two different population distributions such that different actions are optimal for each of them, has a closed, non-empty set of points where infinitely many strategies are optimal. Thus, we indeed have to consider infinitely many (potentially different) fixed point problems in order to compute all stationary mean field equilibria. Indeed, by the classical theory of Markov decision processes we know that those deterministic strategies are optimal that maximize the expected discounted reward . Noting that the expected discounted reward is given by
[TABLE]
and that and are continuous, it follows that also is continuous. The set of all those points where a certain strategy is optimal is the preimage of under the continuous map and thus closed. Whenever there are two strategies that are optimal for distinct population distributions we have two (or more) non-empty closed sets of points for which a certain deterministic strategy is one (but possibly not the only) optimal strategy. Since is itself closed and connected, there will be a non-empty, closed set for which at least two deterministic stationary strategies, and thus infinitely many stationary strategies are optimal.
4 Existence
In Section 3 we proved that there is a stationary strategy that is optimal among all (also time-dependent) strategies. Moreover, we proved that a stationary strategy is optimal for if and only if it is a convex combination from , which is the set of all deterministic stationary strategies that are optimal.
Using this, we will prove that whenever the assumption A1 holds there exists a stationary mean field equilibrium. We will adapt the ideas presented in Doncel et al. (2016a) to prove this. More precisely, we will show the existence of a fixed point of an associated best response map in the dynamics. This map maps to each point all the stationary points of given that is an optimal strategy for . In contrast to the proof of the existence of dynamic equilibria presented in Doncel et al. (2016a) we do not only rely on standard calculus arguments, but instead the proof crucially relies on our probabilistic representation of the problem.
We define the best response map , where denotes the power set of , by setting
[TABLE]
We will now show using Kakutani’s fixed point theorem that this map has a fixed point and that each fixed point of this map induces a stationary mean field equilibrium:
Theorem 4.1**.**
Given assumption A1 there is a stationary mean field equilibrium.
Proof.
We show that has a fixed point using Kakutani’s fixed point theorem (Border, 1985, Corollary 15.3), since any such fixed point yields to a stationary mean field equilibrium: Indeed, for any fixed point we find a strategy such that . Since by Lemma 3.1 and Theorem 3.2 we moreover have that for all and all the pair constitutes a stationary mean field equilibrium.
We first note that is non-empty for all . By Theorem 3.2 the set is non-empty. Since any continuous time Markov chain with finite state space has at least one stationary distribution there exists an such that , which yields that .
Furthermore, for each the set is convex: Let be two distinct points. Then, by definition of , we find two strategies such that
[TABLE]
Define and , which satisfy
[TABLE]
Now let be arbitrary and define . Then satisfies
[TABLE]
which means that
[TABLE]
satisfies . It remains to verify that . For this we note that of and only if , which in turn is equivalent to the requirement that or . This can only happen if or . Thus, since , also .
We now verify that has a closed graph, that is, that for any sequence and for all with and we indeed have : Let be a converging sequence satisfying for all . We denote its limit by . By definition of we find a sequence such that . By compactness of we find a converging subsequence with limit . For any let be such that for all and for all . Since is finite we find a set that occurs infinitely often in the sequence . From this we obtain that for all and . Moreover, since we obtain that for all such that we have for all deterministic strategies satisfying for all . By continuity of and , we obtain that for all these strategies . Thus, . Furthermore, by continuity of , we obtain that
[TABLE]
which shows that .
Using that is a compact metric space and that the graph of is closed, we obtain that the values are compact as the limit of any sequence lies in .
Now Kakutani’s fixed point theorem (Border, 1985, Corollary 15.3) yields a fixed point , which proves the desired claim. ∎
5 The Fixed Point Problem
5.1 The Cut Criterion
To solve the fixed point problem we have to determine the solutions of the equation for all strategies that are optimal for some population distribution and then we have to check whether is indeed optimal for these solutions. In many settings this task is not simple (see Section 6.2). However, often a cut criterion similar to the one used for Markov chains is useful, although it is just a reformulation of the balance equation (see Kelly (1979, Lemma 1.4) for a description of the criterion for standard continuous time Markov chains). The criterion states that if we partition the state space of the Markov chain into two sets, then the probability flow from one set to the other has to equal the probability flow from this other set to the first.
The particular use of the criterion is that in most models that have been consider so far there has always been a set of states for which the dynamics to and from this set cannot be influenced by the player by choosing a particular strategy. This means that any mean field equilibrium irrespective of the chosen strategy has to satisfy certain equations coming from the cut criterion, which could be obtained from the standard balance equations only by sensible rearrangements. In Section 6.2 we will show that the criterion indeed simplifies the search for fixed points.
Theorem 5.1**.**
Let be a stationary strategy and let . Then any stationary population distribution satisfies
[TABLE]
Proof.
The stationarity condition reads for all
[TABLE]
furthermore, since is conservative, we have for all
[TABLE]
This yields for all
[TABLE]
Summing this identity over all yields
[TABLE]
Subtracting the identity
[TABLE]
yields the desired result. ∎
5.2 Mean Field Equilibria are Fixed Points of a Specific Map
This section is devoted to proving an explicit characterization of , which has been introduced in Section 4, in terms of the deterministic maps for those strategies that are optimal for . In order to show this we will need irreducibility of for all strategies . Note that it is sufficient to verify irreducibility for all deterministic strategies since any stationary strategy is a convex combination of deterministic strategies and thus is also irreducible.
The main consequence of being irreducible is that there is a unique stationary distribution of the continuous time Markov chain (CTMC) with generator (see Durrett (1999, Corollary I.4.6 and Theorem I.4.7), Norris (1997, Theorem 3.5.1)). This observation allows us to formulate the main theorem, which we will prove in the rest of the section:
Theorem 5.2**.**
Let such that is irreducible for all . Let, furthermore, be the set of all deterministic optimal strategies for . Then
[TABLE]
with being the unique solution of .
The proof of this theorem basically relies on the idea to characterize the stationary distribution of the CTMC with irreducible generator by a closed form expression and to show thereafter that is a convex combination of . In order to follow this programme we have to prove several properties of the generator matrix starting with the following lemma regarding the structural properties of an irreducible, conservative generator , more precisely regarding the minor , which arises from by deleting the last row and column:
Lemma 5.3**.**
Let be an irreducible, conservative generator matrix. Then all eigenvalues of the minor have negative real part. Consequently has full rank and
[TABLE]
The technical proof can be found in the appendix.
Noting the last column is the negative sum of all other columns we obtain the following corollary:
Corollary 5.4**.**
Let be an irreducible, conservative generator matrix. Then the rank of is .
With these two results, we can now explicitly characterize the stationary distribution given a stationary strategy and a population distribution :
Lemma 5.5**.**
Let be a stationary strategy and such that is irreducible. Let be the transpose where the last row is replaced by . Then we have that the unique stationary distribution is given by
[TABLE]
Furthermore, we can write
[TABLE]
with
[TABLE]
Proof.
The stationary distribution of our process is uniquely determined by (Asmussen, 2003, Theorem II.4.2)
[TABLE]
Since the last equation of the system is the negative sum of all other equations, we obtain that the system (3) is equivalent to
[TABLE]
which by definition of is .
We now show that the rank of the matrix is , as in this case we can invert the matrix. As in Resnick (1992, p.137-139) we rely on the existence of the stationary distribution given . In order to show that has full rank, we show that implies that . For the stationary distribution given we have
[TABLE]
Thus,
[TABLE]
implies that
[TABLE]
Since by Lemma 5.3 the matrix has full rank we obtain that , which proves that has full rank.
The last part of the statement simply follows from Cramer’s rule together with the Laplace expansion of along the last line
[TABLE]
and the observation that
[TABLE]
as differs from only in the -th row. ∎
In order to establish the desired result on characterizing the convex set solely in terms of transition rates and rewards for deterministic strategies, one final preparation has to be made: We have to show that the determinant of has uniform sign over all : For this write for the deterministic strategy satisfying . Then it holds that
[TABLE]
Now a simple application of the intermediate value theorem yields the desired result:
Lemma 5.6**.**
Let be a population distribution such that is irreducible for all . Then has uniform sign over .
Proof.
We note that the map , which ranges from to is continuous. Since the determinant is also a continuous function, we see that is a continuous function. By Lemma 5.5 we have that for all . If there would be a strategy and a strategy such that and , then, by the intermediate value theorem, there would be a such that , which would be a contradiction. ∎
With all these preparations, we can now prove the characterization result stated in the beginning of the section:
Proof of Theorem 5.2.
For readability we suppress the dependence of and on .
Let . Note that is zero for all non-optimal strategies by Theorem 3.2. With being the matrix without the last row, we can now write as follows:
[TABLE]
As the determinant is linear in columns and we have for all we obtain
[TABLE]
Similarly, we obtain using that
[TABLE]
holds for all , that
[TABLE]
for all . This implies
[TABLE]
Thus, we see that is a linear combination of for any stationary strategy and the coefficients are given by
[TABLE]
From (5.2) we obtain that
[TABLE]
By Lemma 5.6 we note that the signs of the determinants and are the same for any two strategies . Thus, as for all we obtain that for all . To conclude, writing , every point in is a convex combination of . Moreover, any convex combination of is the stationary point given a strategy , which by Theorem 3.2 yiels that these points lie in . ∎
Thus, in order to find all mean field equilibria it is sufficient to follow the following programme: First, compute for all sets the set , which collects all those points for which . Second, find all fixed points of the map , that lie in . Writing for the set of all fixed points of the map , we obtain the following result:
Theorem 5.7**.**
Assume that there is a set such that for all and all the matrix is irreducible. Then the set of all distributions lying in induced by some stationary mean field equilibrium is given by
[TABLE]
In case of constant dynamics, that is for all the second step of this programme is simple, since the maps are constant with value and this implies that the unique fixed point is . Thus, we can characterize the set of all mean field equilibria as explicitly by only computing the optimality sets and the stationary points given as follows:
Corollary 5.8**.**
Let the dynamics be constant, that is for all , and let the generators given any strategy be irreducible. Then the set of all distributions induced by some stationary MFE is given by
[TABLE]
6 Examples
This section presents two mean field game models, which have been considered in similar versions in the literature before, and illustrates the application of the techniques presented before. For both examples we will first solve the individual control problem given a fixed population distribution and compute given this the non-empty optimality sets . Thereafter we will solve the fixed point problems relyin on the approaches introduced before and sketch how the full characterization of the equilibria can be obtained.
6.1 A Consumer Choice Model
The model we present now is a model with constant dynamics, that is, the dynamics do not depend on the current population distribution. The context and utility function are similar to a model introduced by Gomes et al. (2014) as a toy example on which the authors demonstrated numerical methods for a specific class of finite state mean field game models, namely those yielding systems of hyperbolic partial differential equations. Note however, that the action spaces and choice options of the players differ systematically.
The model is inspired by consumer choices in the mobile phone sector. The utility of using a certain provider is increasing in the share of customers using it, whereas the costs are constant. We assume that the utility coming from the other customers in service is given by the isoelastic utility function , with . Note that since is always negative for our choice of , one cannot interpret as costs directly, but one rather has to think of consisting of two components: the costs themselves and some base utility from service provision. The players can now choose in our model whether to stick to their provider or whether to switch to the other provider, in this case the player additionally faces a time-unit switching cost . Note that for technical reasons it is important that the player always faces independent of the chosen strategy a small risk of going to the other provider.
These choice options differ substantially from the model in Gomes et al. (2014), where the players could continuously control the rates at which they switch to the other state and were facing costs corresponding to the square of the rate. From an applied point of view it is questionable, in particular when agents are not experts in the game at hand, that players indeed understand what it means to control the transition rates of a Markov chain. Indeed, economic experiments show that most people cannot understand the true effect of random devices even in simple settings (Walker and Wooders, 2008).
The formal description of the model is now given as follows: For technical reasons (i.e. to define the rewards for the case properly) we introduce for small enough the function given by
[TABLE]
which is increasing on . Using this we then define the transition rates and reward functions by
[TABLE]
where and .
In order to analyse the model we first solve the optimality equations
[TABLE]
which directly yield that it is optimal to change in state if and only if and that it is optimal to change in state if and only if . Thus, we know that choosing to change in both states is never optimal. Therefore we focus on the three potentially optimal strategies , and . The expected discounted reward given these strategies is
[TABLE]
In order to understand which strategy is optimal, we compute the differences :
[TABLE]
Using that is increasing and that we obtain for small enough that
[TABLE]
and analogously we obtain
[TABLE]
as well as
[TABLE]
As a next step we compute the fixed points given the three strategies , and , noting that we never need to consider the fixed point given , as it is never optimal to randomize in both states. Since we face standard continuous time Markov chains, this task is simple and we obtain for each strategy a unique fixed point:
[TABLE]
Depending on the choice of parameters we have between one and five equilibria: For simplicity we write
[TABLE]
and note that we always have .
We can now obtain the exact number and position of equilibria by coefficient comparison. The strategies that are used in mixed strategy equilibria are then obtained by solving the balance equation with being the population distribution of the mixed strategy equilibrium. We will omit this last step, which is basically the task of solving a system of linear equations.
- (i)
If and , then together with the deterministic strategy is the unique stationary mean field equilibrium.
- (ii)
If and , then together with the deterministic strategy and together with the deterministic strategy are the deterministic equilibria. Furthermore, there is one mixed strategy equilibrium with that randomizes over and in the second state. If , then the mixed strategy equilibrium coincides with the pure strategy equilibrium with the same population distribution.
- (iii)
If and , then the only equilibrium is given by together with the deterministic strategy .
- (iv)
If and , then together with the deterministic strategy and together with the deterministic strategy are the deterministic equilibria. Furthermore, there is a mixed strategy equilibrium at the point with that randomizes over and in the first state. If , then the mixed strategy equilibrium coincides with the pure strategy equilibrium with the same population distribution.
- (v)
If and , then together with the deterministic strategy , together with the deterministic strategy and together with the deterministic strategy are the deterministic equilibria. Furthermore, there is one mixed strategy equilibrium with that randomizes over and in the first state and one mixed strategy equilibrium with that randomizes over and in the second state. If or , then the mixed strategy equilibrium coincides with the pure strategy equilibrium with the same population distribution.
- (vi)
If and , then together with the deterministic strategy and together with the deterministic strategy are the deterministic equilibria. Furthermore, there is one mixed strategy equilibrium with that randomizes over and in the first state. If , then the mixed strategy equilibrium coincides with the pure strategy equilibrium with the same population distribution.
- (vii)
If and , then together with the deterministic strategy and together with the deterministic strategy are the deterministic equilibria. Furthermore, there is a mixed strategy equilibrium at the point with that randomizes over and in the second state. If , then the mixed strategy equilibrium coincides with the pure strategy equilibrium with the same population distribution.
- (viii)
If and , then then together with the deterministic strategy is the unique stationary mean field equilibrium.
6.2 A Simplified Corruption Model
We now consider a simplified version of the corruption model presented in Kolokoltsov and Malafeyev (2017): In their model a player can be in one of the three states honest (), corrupt () and reserved (). The corrupt players get a higher wage than the honest players, which in turn get a higher wage than the reserved players that have been convicted to be corrupt. For simplicity we set , and and exclude the fine for being convicted, which is an additional feature in their model. The players can choose given that they are not reserved, whether they want to stay corrupt/honest or whether they want to switch behaviour. In this case they become honest/corrupt with the rate . A player that is reserved is recovered with a fixed rate and we assume that he will then be honest. Additionally the model captures “social pressure” in two ways: First, the more players are corrupt the higher is the pressure (one cannot escape) to also become corrupt. Second, the more players are honest the higher is the rate to become convicted to be corrupt. In the model of Kolokoltsov and Malafeyev (2017) there is also a principal agent that convicts players, for simplicity we decided to ignore this feature of the model as well.
The formal characterization is given by together with
[TABLE]
and , where all parameters , , and are strictly positive. A visualization of this model is given in Figure 1.
We start with computing the value function for given as the unique solution of
[TABLE]
We directly see that one should choose to change in state if and to stay in state if and one should chose to change in state if and to stay in state if . A straightforward calculation yields that the optimality sets are given by
[TABLE]
Note however, that depending on the choice of parameters the quantity is greater than one, exactly one or less than one. In the first case only is non-empty, in the second case and are non-empty and in the third case all optimality sets are non-empty.
As a second step we use the cut criterion from Theorem 5.1 for the set , which yields the equation for all stationary strategies. Together with the equation we obtain
[TABLE]
or equivalently note that this representations directly imply that . Furthermore we obtain that the sum of and is always less than one, thus any mean field equilibrium is uniquely characterized by describing . More precisely, any stationary mean field equilibrium has a distribution of the form
[TABLE]
It now remains to consider the fixed point problems given the possible optimal strategies: We start by noting that stationary points of the dynamics given exists whenever (5) and
[TABLE]
is satisfied, which is true if or . Thus whenever these points lie in or we have a deterministic mean field equilibrium given the strategy .
Similarly, stationary points of the dynamics given have to satisfy (5) and
[TABLE]
which is true if either or . Thus whenever these points lie in or we have a deterministic mean field equilibrium given the strategy .
When searching for stationary points given the dynamics of mixed strategies, which might be equilibria,we can restrict to those that lie inside the set . In this set all equilibria have to satisfy , that is
[TABLE]
It remains to check whether there is a strategy such that the point
[TABLE]
is indeed a fixed point for the individual dynamics equation given strategy , which means that we have to find constants and that satisfy for this point
[TABLE]
which is parameter-dependent.
As in the previous example, we would now need to perform a case analysis to obtain the exact set of mean field equilibria for all possible equilibrium constellations. Additionally, we would need to solve the balance equations for for any that is a candidates for randomized equilibria. Both tasks are simple, but tedious and we omit them here.
Appendix A Appendix
Proof of Lemma 5.3.
Since we face a conservative generator, we see by definition that for all and that . Thus, all off-diagonal entries of are non-negative and the row sum is always zero. Furthermore, by requiring irreducibility we do not have a row of zeros, thus the diagonal entries are strictly negative.
The matrix again has non-negative off-diagonal entries, strictly negative diagonal entries and row sum is less or equal zero. Furthermore the irreducibility of implies that is non-zero for at least one and thus the row sum for at least one is strictly negative. We now show that there exists a vector such that and for at least one such that (where the inequality signs hold pointwise).
For this we first note that for all
[TABLE]
From which one directly sees that is increasing in and decreasing in since for all .
Define for all the set as the set of all indices where the -th component of is greater than zero. We will now define inductively an sequence a sequence of vectors such that for some we have . We will start with the vector with a one at every component and construct in such a way that for all and moreover for all and all .
Starting with being the vector of ones directly implies that , as we have previously seen that the row sum, which is is positive for some index .
In the -th step () we check whether . If this is the case we have shown that a vector with the desired properties exists, else there is an index . In this case let . Since our CTMC is irreducible the underlying transition graph is strongly connected, which implies that there is a path from to . Let be the first node on this path that lies in and let be its predecessor. Note that by definition of the transition graph . Now let be as follows for all and such that
[TABLE]
This is possible as and and moreover is continuous and increasing in .
It is obvious that for all . It remains to check whether implies and to show that , as this proves that .
As by construction it is obvious that . For we see again as is decreasing in and that
[TABLE]
Thus if , then .
For we have a strict inequality as since and . Thus , which directly implies , thus .
Thus, we indeed obtained a vector such that and for at least one such that . Now we can conclude that it is a non-singular -matrix in the sense of Berman and Plemmons (1979, Chapter 6) and furthermore that all eigenvalues of our matrix have positive real part. Therefore, all eigenvalues of have negative real part.
As the matrix is real, all complex eigenvalues appear in pairs of complex conjugates. As the product of a complex number and its complex conjugate is non-negative, we obtain that the determinant which can be computed as the product of all eigenvalues is as claimed since only the number of real eigenvalues or more specifically the parity of this number matters for the sign of the determinant. As the parity of the number of real eigenvalues always equals the parity of , we conclude that the sign pattern of the determinant is as claimed. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Asmussen (2003) Søren Asmussen. Applied Probability and Queues , volume 51 of Stochastic Modelling and Applied Probability . Springer-Verlag, New York, 2 nd edition, 2003. ISBN 0-387-00211-1.
- 2Basna et al. (2014) Rani Basna, Astrid Hilbert, and Vassili N. Kolokoltsov. An epsilon-Nash equilbrium for non-linear Markov games of mean-field-type on finite spaces. Commun. Stoch. Anal. , 8(4):449–468, 2014. 10.31390/cosa.8.4.02 . · doi ↗
- 3Benazzoli et al. (2018) Chiara Benazzoli, Luciano Campi, and Luca Di Persio. Mean field games with controlled jump-diffusion dynamics: Existence results and an illiquid interbank market model, 2018. Ar Xiv preprint ar Xiv:1703.01919.
- 4Bensoussan et al. (2013) Alain Bensoussan, Jens Frehse, and Phillip Yam. Mean Field Games and Mean Field Type Control Theory . Springer Briefs in Mathematics. Springer, New York, Heidelberg, Dordrecht, London, 2013. ISBN 978-1-4614-8507-0.
- 5Berman and Plemmons (1979) Abraham Berman and Robert J. Plemmons. Nonnegative matrices in the mathematical sciences . Computer science and applied mathematics. Academic Press, Inc., New York, 1979. ISBN 0-12-092250-9.
- 6Besancenot and Dogguy (2015) Damien Besancenot and Habib Dogguy. Paradigm Shift: A Mean Field Game Approach. Bull. Econ. Res. , 67(3):289–302, 2015. 10.1111/boer.12024 . · doi ↗
- 7Border (1985) Kim C. Border. Fixed point theorems with applications to economics and game theory . Cambridge University Press, Cambridge, 1985. ISBN 0-521-38808-2.
- 8Caines et al. (2017) Peter E. Caines, Minyi Huang, and Roland P. Malhamé. Mean Field Games. In Tamer Basar and Georges Zaccour, editors, Handbook of Dynamic Game Theory . Springer, Cham, 2017. ISBN 978-3-319-27335-8. 10.1007/978-3-319-27335-8_7-1 . · doi ↗
