On non-unique solutions in mean field games

Bruce Hajek; Michael Livesay

arXiv:1903.05788·math.OC·March 19, 2019·CDC

On non-unique solutions in mean field games

Bruce Hajek, Michael Livesay

PDF

TL;DR

This paper investigates the non-uniqueness of solutions in mean field games, focusing on a simple symmetric two-state model, and explores the relationship between finite-player Nash equilibria and mean field game solutions.

Contribution

It characterizes all equilibria in a symmetric two-state mean field game and links finite-player Nash equilibria to mean field game solutions through fluid limits.

Findings

01

All equilibria in the symmetric two-state game are identified.

02

Finite-player Nash equilibria converge to mean field game equilibria as N increases.

03

Stable fixed points of the mean field best response are likely the limits of finite-player equilibria.

Abstract

The theory of mean field games is a tool to understand noncooperative dynamic stochastic games with a large number of players. Much of the theory has evolved under conditions ensuring uniqueness of the mean field game Nash equilibrium. However, in some situations, typically involving symmetry breaking, non-uniqueness of solutions is an essential feature. To investigate the nature of non-unique solutions, this paper focuses on the technically simple setting where players have one of two states, with continuous time dynamics, and the game is symmetric in the players, and players are restricted to using Markov strategies. All the mean field game Nash equilibria are identified for a symmetric follow the crowd game. Such equilibria correspond to symmetric $ϵ$ -Nash Markov equilibria for $N$ players with $ϵ$ converging to zero as $N$ goes to infinity. In contrast to the mean…

Figures19

Click any figure to enlarge with its caption.

Equations147

P (i (t + h) = 1 - i ∣ i (t) = i) = (α_{t} + η) h + o (h)

P (i (t + h) = 1 - i ∣ i (t) = i) = (α_{t} + η) h + o (h)

α min E [\int_{0}^{T} c (i (t), θ_{t}, α_{t}) d t + ψ (i (T), θ_{T})],

α min E [\int_{0}^{T} c (i (t), θ_{t}, α_{t}) d t + ψ (i (T), θ_{T})],

{\small\begin{array}[]{c|l}\mbox{transition}&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\mbox{rate}\\ \hline\cr(i,n)\rightarrow(1-i,n)&\alpha(i,n,t)+\eta\\ (i,n)\rightarrow(i,n+1)&\gamma^{+}(i,n,t)\\ &~{}~{}=(N-n)(\beta(1,n+1-i,t)+\eta).\\ (i,n)\rightarrow(i,n-1)&\gamma^{-}(i,n,t)\\ &~{}~{}=n(\beta(0,n-i,t)+\eta).\end{array}}

{\small\begin{array}[]{c|l}\mbox{transition}&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\mbox{rate}\\ \hline\cr(i,n)\rightarrow(1-i,n)&\alpha(i,n,t)+\eta\\ (i,n)\rightarrow(i,n+1)&\gamma^{+}(i,n,t)\\ &~{}~{}=(N-n)(\beta(1,n+1-i,t)+\eta).\\ (i,n)\rightarrow(i,n-1)&\gamma^{-}(i,n,t)\\ &~{}~{}=n(\beta(0,n-i,t)+\eta).\end{array}}

- \overset{u}{˙} (i, n, t) = f (i, n) - \frac{(( α ^{*} ( i , n , t ) ) ^{2}}{2}

- \overset{u}{˙} (i, n, t) = f (i, n) - \frac{(( α ^{*} ( i , n , t ) ) ^{2}}{2}

+ η (u (1 - i, n, t) - u (i, n, t))

+ γ^{+} (i, n, t) (u (i, n + 1, t) - u (i, n, t))

+ γ^{-} (i, n, t) (u (i, n - 1, t) - u (i, n, t)),

u (i, n, T) = ψ (i, n)

α^{*} (i, n, t) = (u (i, n, t) - u (1 - i, n, t))_{+} .

α^{*} (i, n, t) = (u (i, n, t) - u (1 - i, n, t))_{+} .

\dot{θ_{t}} = (1 - θ_{t}) ((u (1, t) - u (0, t))_{+} + η)

\dot{θ_{t}} = (1 - θ_{t}) ((u (1, t) - u (0, t))_{+} + η)

- θ_{t} ((u (0, t) - u (1, t))_{+} + η)

- \overset{u}{˙} (i, t) = f (i, θ_{t}, t) - η (u (i, t) - u (1 - i, t))

- \frac{(( u ( i , t ) - u ( 1 - i , t ) ) _{+} ) ^{2}}{2}

θ_{0} = \overline{θ}, u (i, T) = ψ (i, θ_{T}) .

- \overset{u}{˙} (i, t) = f (i, θ_{t}) - \frac{(( u ( i , t ) - u ( 1 - i , t ) ) _{+} ) ^{2}}{2}

- \overset{u}{˙} (i, t) = f (i, θ_{t}) - \frac{(( u ( i , t ) - u ( 1 - i , t ) ) _{+} ) ^{2}}{2}

- η (u (i, t) - u (1 - i, t))

\displaystyle u(i,T)=\psi(i,\theta_{T}).~{}~{}~{}~{}\mbox{boundary condition at $T$}

\dot{θ_{t}} = (1 - θ_{t}) ((u (1, t) - u (0, t))_{+} + η)

\dot{θ_{t}} = (1 - θ_{t}) ((u (1, t) - u (0, t))_{+} + η)

- θ_{t} ((u (0, t) - u (1, t))_{+} + η)

θ_{0} = \overline{θ} \mbox b o u n d a r y co n d i t i o na t 0

\displaystyle\lim_{N\to\infty}\mathbb{P}\left[\bigg{|}\frac{n^{N}(t)}{N+1}-\theta_{t}\bigg{|}<\epsilon\mbox{ for }0\leq t\leq T\right]=1.

\displaystyle\lim_{N\to\infty}\mathbb{P}\left[\bigg{|}\frac{n^{N}(t)}{N+1}-\theta_{t}\bigg{|}<\epsilon\mbox{ for }0\leq t\leq T\right]=1.

\displaystyle f(i,\theta)=|1-\theta-i|=\left\{\begin{array}[]{cl}1-\theta&i=0\\ \theta&i=1\\ \end{array}\right.

\displaystyle f(i,\theta)=|1-\theta-i|=\left\{\begin{array}[]{cl}1-\theta&i=0\\ \theta&i=1\\ \end{array}\right.

\displaystyle\begin{array}[]{l}~{}~{}\dot{x}=y-x|y|-2\eta x\\ -\dot{y}=x-\frac{1}{2}y|y|-2\eta y\end{array}

\displaystyle\begin{array}[]{l}~{}~{}\dot{x}=y-x|y|-2\eta x\\ -\dot{y}=x-\frac{1}{2}y|y|-2\eta y\end{array}

H (x, y) = \frac{x ^{2} - 4 η x y + y ^{2} - x y ∣ y ∣}{2} .

H (x, y) = \frac{x ^{2} - 4 η x y + y ^{2} - x y ∣ y ∣}{2} .

\dot{ϕ} = \frac{y ˙ x - y x ˙}{x ^{2} + y ^{2}} = - 1 + \frac{\frac{3}{2} x y ∣ y ∣ + 4 η x y}{x ^{2} + y ^{2}}

\dot{ϕ} = \frac{y ˙ x - y x ˙}{x ^{2} + y ^{2}} = - 1 + \frac{\frac{3}{2} x y ∣ y ∣ + 4 η x y}{x ^{2} + y ^{2}}

\displaystyle\begin{array}[]{l}~{}~{}\dot{x}=y-2\eta x\\ -\dot{y}=x-2\eta y\end{array}

\displaystyle\begin{array}[]{l}~{}~{}\dot{x}=y-2\eta x\\ -\dot{y}=x-2\eta y\end{array}

x_{t}

x_{t}

y_{t}

\overline{P} = (y x) ≜ (2 + η ^{2} - 3 η 1 - η ^{2} - η 2 + η ^{2}) .

\overline{P} = (y x) ≜ (2 + η ^{2} - 3 η 1 - η ^{2} - η 2 + η ^{2}) .

\displaystyle\begin{array}[]{l}~{}~{}\dot{x}=y-2\eta x\\ -\dot{y}=x-2\eta y\end{array}

\displaystyle\begin{array}[]{l}~{}~{}\dot{x}=y-2\eta x\\ -\dot{y}=x-2\eta y\end{array}

y_{s} = (L_{1} x)_{s} = \int_{s}^{T} e^{- 2 η (T - u)} x_{u} d u

y_{s} = (L_{1} x)_{s} = \int_{s}^{T} e^{- 2 η (T - u)} x_{u} d u

x_{t} = (L_{2} y)_{t} = \int_{0}^{t} e^{- 2 η (t - s)} y_{s} d s

x_{t} = \int_{0}^{T} K (t, u) x_{u} d u

x_{t} = \int_{0}^{T} K (t, u) x_{u} d u

\displaystyle\lim_{N\to\infty}\widetilde{\mathbb{P}}\left[\bigg{|}\frac{n^{N}(t)}{N+1}-\theta_{t}\bigg{|}<\epsilon\mbox{ for }0\leq t\leq T\right]=1.

\displaystyle\lim_{N\to\infty}\widetilde{\mathbb{P}}\left[\bigg{|}\frac{n^{N}(t)}{N+1}-\theta_{t}\bigg{|}<\epsilon\mbox{ for }0\leq t\leq T\right]=1.

P (A)

P (A)

\leq c (\int_{Ω} 1_{{A}}^{q} d P)^{1/ q} = c P (A)^{1/ q}

\frac{d P}{d P} = exp (\int_{0}^{T} ln (\frac{β ^{*} + η}{α ^{*} + η}) d Y_{t} - \int_{0}^{T} (β^{*} - α^{*}) d t)

\frac{d P}{d P} = exp (\int_{0}^{T} ln (\frac{β ^{*} + η}{α ^{*} + η}) d Y_{t} - \int_{0}^{T} (β^{*} - α^{*}) d t)

(\frac{d P}{d P})^{p} = \frac{d P}{d P} e^{\int_{0}^{T} (β^{*} + η)^{p} - (α^{*} + η)^{p} - p (β^{*} - α^{*}) d t} \leq \frac{d P}{d P} Γ_{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On non-unique solutions in mean field games

Bruce Hajek and Michael Livesay

Department of Electrical and Computer Engineering and Coordinated Science Laboratory

University of Illinois, Urbana, IL 61801, USA

Email: {b-hajek, mlivesa2}@illinois.edu

Abstract

The theory of mean field games is a tool to understand noncooperative dynamic stochastic games with a large number of players. Much of the theory has evolved under conditions ensuring uniqueness of the mean field game Nash equilibrium. However, in some situations, typically involving symmetry breaking, non-uniqueness of solutions is an essential feature. To investigate the nature of non-unique solutions, this paper focuses on the technically simple setting where players have one of two states, with continuous time dynamics, and the game is symmetric in the players, and players are restricted to using Markov strategies. All the mean field game Nash equilibria are identified for a symmetric follow the crowd game. Such equilibria correspond to symmetric $\epsilon$ -Nash Markov equilibria for $N$ players with $\epsilon$ converging to zero as $N$ goes to infinity.

In contrast to the mean field game, there is a unique Nash equilibrium for finite $N.$ It is shown that fluid limits arising from the Nash equilibria for finite $N$ as $N$ goes to infinity are mean field game Nash equilibria, and evidence is given supporting the conjecture that such limits, among all mean field game Nash equilibria, are the ones that are stable fixed points of the mean field best response mapping.

I Introduction and related work

The theory of mean field games was initiated independently by Huang, Caines, and Malhamé [4] and Lasry and Lions [5]. The setting of Huang et al. is linear quadratic Gaussian (LQG) control and the setting of Lasry and Lions is continuous state Markov diffusion processes. The work of Gomes, Mohr, and Souza [3] translates much of the theory of [5] into the context of continuous time finite state Markov processes. The LQG and finite state settings are technically simpler than the setting of continuous state Markov processes. All three of these works impose assumptions implying uniqueness of solutions to the mean field game equations.

The notion of Markov perfect equilibrium was introduced in [6]. It is basically a Nash equilibrium in a controlled Markovian dynamics framework, such that each player can use a strategy that selects control actions based on the current states of all players. In particular, the constraint on strategies for Markov perfect equilibria rules out trigger strategies such that some player can be punished for past actions. Given a game and $\epsilon>0$ , a strategy profile is defined to be an $\epsilon$ -equilibrium (or $\epsilon$ Nash equilibrium) if it is not possible for any player to gain more than $\epsilon$ in expected payoff by unilaterally deviating from his/her strategy.

The paper [4] establishes $\epsilon$ -Nash equilibrium properties for strategy profiles consisting of the decentralized individual control laws that result as responses to the collective mass trajectory. Condition H1 of [4] is a key to guaranteeing uniqueness of the mean field equations, In particular, for the other parameters fixed, the value of $r$ in the term for control cost, $ru^{2}$ , should not be too small. In essence, condition H1 restricts the level of coupling among the players. The mean field game (MFG) equations are expressed as a fixed point of an operator $\cal T$ in [4]. Proposition 4.5 of [4] states that the fixed point for $\cal T$ is globally attracting under condition H1 in the paper. Section VI of [4] illustrates a cost gap between individual and global based controls. This is an example of the fact that the social welfare at a Nash equilibrium in game theory does not need to equal the maximum social welfare achievable if the players were to cooperate.

The paper [3] studies the continuous-time, finite state version of mean field game theory. Assumption 3, p. 110, gives a monotonicity condition that ensures uniqueness of solutions to the mean field game equations. Proposition 4 of [3], on the existence of a mean field game Nash equilibrium is proved by using Brouwer’s fixed point theorem applied to the map $\theta\mapsto\xi(\theta),$ which is analogous to the map $\cal T$ of [4]. The domain of $\xi$ is the set ${\cal F}$ of uniformly Lipschitz continuous functions on the interval $[0,T].$

In contrast, multiple solutions of the mean field equations naturally arise in [10], where synchronization of coupled oscillators requires solutions that depart from the incoherence solution. The setup is similar to the discrete-state setting we consider in that it is in continuous time, the players are coupled through their running costs, and players can take actions depending on their own states and on the states of the other players. But the setup in [10] is different in that the state space is continuous – specifically it is the unit circle, and the focus is on infinite horizon average cost. The running cost for player $i$ , $c(\theta_{i},\theta_{-i})=\frac{1}{n}\sum_{j}(1/2)\sin^{2}((\theta_{i}-\theta_{j})/2),$ is join the crowd type; it is smaller if the states are closer together. It is similar to flocking of birds or synchronization of fireflies. The separate Brownian motions of different players tend to make them drift apart, and it requires cost for them to try to stick together. If the coefficient $R$ for the cost is large enough it is not worth the players trying to stick close together, and for the MFG limit they will stay uniformly distributed over the circle (i.e. the incoherence solution). As $R$ crosses below some critical value $R_{c},$ the incoherence solution still exists but it becomes unstable and additional solutions appear. We find an equivalent phenomena for the simpler discrete state model in this paper. In addition, our setting is considerably simpler than that of [10], allowing us to examine the stability of the mean field map $\cal T$ for a finite time horizon.

Some related papers with discrete state models The paper [9] introduces the notion of oblivious equilibrium and compares it to the stronger equilibrium notion of Markov perfect equilibrium. In a Markov perfect equilibrium, the actions of any player can depend on the current states of all players. In contrast, for an oblivious equilibrium, the actions of any player can depend only on the state of the player itself. This limits the abilities of players to react to fluctuations in population dynamics for a finite number of players. However, in the mean field limit, the population dynamics becomes deterministic, in which case the difference between the two equilibrium concepts diminishes in the large number of players limit. That is the notion explored in [9]. An approximation theory of [9] shows that an oblivious equilibrium under certain technical conditions can be approximated by a Markov perfect equilibrium, while the converse direction is not necessarily true. The setting of [9] is discrete time throughout.

Papers [1] and [8] discuss MFG for discrete state Markov processes. Paper [8] considers a so-called Markov decision evolutionary game. It is similar to the classical evolutionary dynamics setting, but in contrast to the classical setting, players have both a type (that doesn’t change) and an internal state (that evolves in a Markov fashion). The number of players involved in an event at a discrete time point is stochastically bounded, so as the number of players converges to infinity, time is sped up and a continuous time limit results. A mean field limit for fixed Markov policies exists by a Kurtz type theorem. The setting of [1] is also a discrete state Markov process for each player, The models of both [1] and [8] assume the players use so-called stationary policies, such that the action of a player depends on the type of the player and internal state of the player, but not on the states of other players. Thus, the equilibrium concept is oblivious equilibrium.

II Problem formulation

The model we adopt is almost a special case of the model of [4]. We consider $N+1$ players with each having state space $\{0,1\}.$ The state $(i(t):0\leq t\leq T)$ of a given player evolves as a controlled Markov process with predictable control $\alpha_{t}$ , such that the jump probabilities of the state process are given by

[TABLE]

for $h>0.$ The parameter $\eta\geq 0$ represents a background jump rate, so if $\eta>0$ then the process has minimum jump rate $\eta.$ The background jumping is similar in spirit to the Brownian motions that work against coherence of the coupled oscillators in [10]. The objective function of the reference player is to select $(\alpha_{t})$ to solve

[TABLE]

where $\theta_{t}$ is the fraction of other players in state 0 at time $t.$ The running costs are assumed to have the form $c(i,\theta,\alpha)=f(i,\theta)+\frac{\alpha^{2}}{2},$ such that the residence costs per unit time, $f(0,\theta)$ and $f(1,\theta)$ , and terminal costs, $\psi(0,\theta),\psi(1,\theta),$ are all bounded, and uniformly Lipschitz continuous in $\theta.$

Hamilton Jacobi Bellman (HJB) equation for $N+1$ player system

A state feedback control for a given player is a nonnegative function $(\alpha(i,n,t))$ such that $i\in\{0,1\}$ represents the current state of the player, $n\in\{0,\ldots,N\}$ represents the number of other player in state 0, and $t\in[0,T].$ Suppose the reference player uses a state feedback control $(\alpha(i,n,t))$ , and the other $N$ players use state feedback control $(\beta(i,n,t)).$ Then $(i(t),n(t))_{0\leq t\leq T}$ forms a controlled Markov process on $\{0,1\}\times\{0,1,\ldots,N\},$ where $i(t)$ represents the state of the reference player and $n(t)$ represents the number of other players in state 0. The transition rates are as follows:111If $j\neq i$ then $i$ itself is one of the “other players” for player $j.$

[TABLE]

Denote the cost-to-go function for the reference player by $u(i,n,t).$ The HJB equations for it are:

[TABLE]

where the corresponding control policy is

[TABLE]

The HJB equations (1)-(3) can be viewed in two different ways.

•

For policy $\beta$ of the other $N$ players fixed, (1) - (3) determine the best response policy for the reference player. i.e. $\alpha^{*}=BR(\beta).$

•

To find a symmetric Nash equilibrium, replace $\alpha(\cdot,\cdot,t)$ and $\beta(\cdot,\cdot,t)$ by $\alpha^{*}(\cdot,\cdot,t)$ in the definition of $\gamma^{\pm}$ and (1)- (3). This yields a $2(N+1)$ dimensional ode with terminal boundary condition and Lipschitz continuous right hand side that uniquely determines the functions $(u(i,n,t))$ and, hence also, the feedback control law $\alpha^{*}.$ The strategy profile such that all $N+1$ players use $\alpha^{*}$ is a Markov perfect Nash equilibrium, because $\alpha^{*}$ is determined backwards from the terminal condition yielding a best response for any interval of the form $[t,T].$ Moreover, the Markov perfect equilibrium is the unique Nash equilibrium among all Markov type (i.e. state feedback) strategy profiles, because the similar HJB equations for a more detailed model description with state space $\{0,1\}^{N+1}$ still has a unique solution and it is necessarily invariant under permutation of the players.

Mean field game equilibria and map

A mean field game Nash equilibrium for the finite horizon problem with initial value $\overline{\theta}$ is any solution $(\theta_{t},u(i,t))$ to the following equations.222Note the double use of notation “ $u.$ ” We write $u(i,t)$ for $u$ associated with mean field game solutions and $u(i,n,t)$ for $u$ associated with the $N+1$ player Markov perfect equilibrium.

[TABLE]

Note that the boundary conditions (6) include both initial and terminal values. The mean field equations (4)-(6) can be written as a fixed point equation, $\theta={\cal T}(\theta)$ , where ${\cal T}$ maps a collective mass trajectory $(\theta_{t}:0\leq t\leq T)$ to another trajectory. It is determined by first computing the decentralized individual control laws for the players. Then by the uniform law of large numbers (see Appendix B), if each of the players follows the same decentralized individual control law, their state processes will be independent and the empirical average of such processes will converge to an expected $\widetilde{\theta}$ that is the output collective mass trajectory. More concretely, ${\cal T}(\theta)$ is defined as follows. First, cost-to-go functions $(u(i,t))$ are determined by the HJB terminal value problem for a single player, in response to the collective mass trajectory $\theta.$

[TABLE]

Then $\widetilde{\theta}_{t},$ the probability a single player using the decentralized state-feedback control $\alpha_{t}(i,t)=(u(i,t)-u(1-i,t))_{+}$ is in state [math] at time $t$ , is determined by the initial value problem (Kolmogorov forward equation):

[TABLE]

Motivated by the law of large numbers, $\widetilde{\theta}$ is defined to be the new collective mass trajectory, i.e. $\widetilde{\theta}={\cal T}(\theta).$

The mean field game equations (4) and (5), with the addition of an average cost per unit time term $\kappa$ on the right-hand side of (5) correspond to an infinite horizon game for average cost per unit time. (See [3], Section 2.12, p. 117.) In that case the value functions $u(i,t)$ represent realative cost to go. The boundary conditions (6) are replaced by the condition that $\theta$ be constant in time or be periodic.

Fluid limits of Markov perfect equilibrium

As noted in the introduction, there can be multiple mean field game Nash equilibria, even for a finite horizon problem with given boundary conditions. A mean field game Nash equilibrium $(\theta_{t},u(i,t))$ yields a decentralized player strategy $\alpha_{t}(i,t)=(u(i,t)-u(1-i,t))_{+}.$ For finite $N$ , the strategy profile such that every player uses $(\alpha_{t}(i,t))$ is easily seen to be an $\epsilon$ -Nash equilibria such that $\epsilon\to 0$ as $N\to\infty.$ (See Appendix B.)

However, for finite $N$ there is a unique Markov perfect Nash equilibrium strategy profile, so for a given initial condition, the distribution of the finite $N$ system is uniquely determined. It is natural, therefore, to single out collective mass trajectories that arise as limits of the mass trajectories for Markov perfect equilibria.

Definition II.1.

Let $n^{N}(t)$ denote the number of players in state 0 at time $t$ under the unique symmetric Markov perfect equilibrium for the $N+1$ player game, and for some initial condition depending on $N.$ Then $\theta=(\theta_{t}:0\leq t\leq T)$ is a fluid limit Markov perfect trajectory (FLMP trajectory) if for some sequence of initial states with $\lim_{N\to\infty}\frac{n^{N}(0)}{N}\to\theta_{0},$ the following holds for any $\epsilon>0$ ,

[TABLE]

Proposition 1.

Suppose $\eta>0.$ An FLMP trajectory is a mean field game Nash equilibrium.

See Appendix A for a proof. We conjecture the proposition is also true for $\eta=0,$ but a change of probability measure argument in the proof breaks down if $\eta=0.$ Proposition 1 raises the question of how to identify which mean field Nash equilibria are FLMP trajectories.

Contributions of the paper

Proposition 1 is new and its proof extends to the general setting of [3]. It shows that the search for FLMP trajectories can be limited to the mean field game Nash equilibria. The next contribution of this paper is to identify all of the MFG equilibria for a natural special case of the two state model called follow the crowd. This model is analogous to the model of synchronization of oscillators game [10], but considerably simpler, so we can identify the finite horizon solutions as well as the infinite horizon ones. The third contribution is to offer the following conjecture, and give evidence for it:

Conjecture 1.

The FLMP trajectories are the stable fixed points of the MFG mapping ${\cal T}.$

A similar type of conjecture is implicit in [10] based on a notion of stability for constant, long-term average cost infinite horizon solutions, called linear asymptotic stability. The paper [10] identifies the critical cost threshold at which the incoherence solution becomes unstable. In addition to giving evidence for Conjecture 1 in the setting of finite horizon games, we also show that the results of [10] for constant, long-term average cost infinite horizon solutions, carry over to the setting of two state Markov processes. For the infinite horizon framework, we show asymptotic stability of certain fixed points for the nonlinear dynamics in Section III-C, and Appendix D gives an analysis based on the notion of linear asymptotic stability introduced in [10]. Additional results are given in the appendix of this paper, including, for contrast, a similar analysis for an avoid the crowd model with unique mean field game solutions, and a description of a partial differential equation (PDE) (given for more general model in [3]) that can be considered to be an extension of the notion of mean field game.

III MFG equilibria for follow the crowd

The follow the crowd model corresponds to the following cost per time spent in state $i$ :

[TABLE]

In particular, if $\theta>1/2$ (more than half of the other players in state 0), then state 0 has smaller cost per unit time than state 1.

Letting $y=u_{1}-u_{0},$ $x=2\theta-1,$ the mean field equations (4)- (6) can be written as:

[TABLE]

with the boundary conditions $x_{0}=2\overline{\theta}-1$ and $y_{T}=\psi\left(1,\frac{1+x_{T}}{2}\right)-\psi\left(0,\frac{1+x_{T}}{2}\right).$ Once a solution $(x,y)$ to (12) is found for the finite horizon problem over $[0,T]$ , a corresponding solution $(u_{0},u_{1},\theta)$ to the mean field game equations can be found by simply integrating (4)- (5) because the righthand sides of (4)- (5) are determined by $(x_{t},y_{t}).$

A useful fact is that the equations (12) form a Hamiltonian system, for the Hamiltonian function $H$ :

[TABLE]

In other words, (12) has the form $\dot{x}=H_{y}$ and $\dot{y}=-H_{x},$ where $H_{x}$ and $H_{y}$ represent partial derivatives of $H.$ Consequently, the value of $H$ is constant along the solutions of (12), because $\frac{dH(x_{t},y_{t})}{dt}=\langle\nabla H,\binom{H_{y}}{-H_{x}}\rangle\equiv 0,$ so the trajectories trace out level contours of $H.$ This model is a special case of potential mean field games defined in [3], Section 5, for which Hamiltonians exist.

Contour maps of $H$ are shown in Fig. 1 for various values of $\eta.$

For small values of $x,y$ the quadratic terms in $H$ dominate the cubic term, and for $\eta<1/2$ , constant $x^{2}-4\eta xy+y^{2}$ gives elliptical orbits of $x,y,$ in the clockwise direction.

III-A Finite time horizon mean MFG solutions

For the finite horizon mean field game with zero terminal cost (i.e. terminal boundary condition $y_{T}=0$ ), and initial state $x_{0}=0$ , correspond to paths that begin on the $y$ axis (so the initial condition $x_{0}=0$ is satisfied) and end on the $x$ axis. One solution is $(x_{t},y_{t})\equiv(0,0)$ for $0\leq t\leq T.$ Let $\phi=\arctan\left(\frac{x}{y}\right)$ denote the angle of $x,y$ from the positive $x$ axis. The angular velocity of $(x,y)$ is given by

[TABLE]

It is negative along the $y$ axis, indicating clockwise motion. If $\eta\geq 1/2$ then $\dot{\phi}>0$ along the line $x=y$ , indicating that $y=0$ is never reached. Thus, if $\eta\geq 1/2$ , the trajectory $(0,0)$ is the only MFG equilibrium.

If $\eta<1/2$ then $\dot{\phi}<0$ for $(x,y)$ in a neighborhood of the origin, indicating clockwise movement. Moreover, for $\phi$ fixed, $\dot{\phi}$ is an increasing function of the distance of $(x,y)$ to the origin (decreasing angular speed because angular velocity is negative). Thus, the time for $(x,y)$ to traverse a contour across the first quadrant is increasing in $y_{0}.$ for $y_{0}>0.$ As $y_{0}\to\mathbf{0}$ the dynamics is given, to first order, by the MFG linearized about $(0,0)$ , given by

[TABLE]

with solution of the form (setting $x_{0}=0$ and $y_{0}>0$ ):

[TABLE]

The time it takes the linear system to traverse the first quadrant is $T_{c}(\eta)\triangleq\frac{\pi-\arccos(2\eta)}{\sqrt{1-4\eta^{2}}}.$ Hence, as $y_{0}\to 0,$ the traversal time for the quadrant converges to $T_{c}(\eta).$ Thus, for $\eta<1/2$ and $T\leq T_{c}(\eta),$ $(0,0)$ is the unique solution to the MFG. For $T>T_{c}(\eta)$ there is one more solution that remains in, and traverses, the first quadrant, and the negative of that solution remains in, and traverses, the third quadrant. For $T$ large enough there are solutions that traverse contours of $H$ through three quadrants, five quadrants, and so on. A similar radial velocity analysis for the pair $(y,\dot{y})$ (see Appendix C) establishes that the entire periods of the dynamical system are increasing with amplitude, as illustrated in Fig. 2. Since the dynamics is symmetric under rotation by $\pi,$ we conclude that for any odd number $k$ , starting on the positive $y$ axis, the time required to rotate through $k$ quadrants is increasing in the initial condition $y_{0}.$ Therefore, as $T$ increases from 0, the number of solutions starts at one and jumps up by two when $T$ crosses times of the form $T_{c}+k\pi/(\sqrt{1-4\eta^{2}})$ for $k\geq 1.$ Equivalently, the number of solutions is $1+2\left\lceil\frac{(T-T_{c})\sqrt{1-4\eta^{2}}}{\pi}\right\rceil.$

III-B Infinite horizon constant or periodic MFG solutions

The equilibrium points of the dynamics (12) are the critical points of the Hamiltonian function (i.e. $\nabla H=0$ ), and are given as follows. If $0\leq\eta<0.5,$ $(0,0)$ is an equilibrium point and there are also exactly two nonzero equilibrium points, given by $\pm\overline{P}$ , where

[TABLE]

If $\eta\geq 0.5,$ $(0,0)$ is the unique equilibrium point.

Regarding infinite horizon periodic solutions, examination of $H$ and the equations for angular velocity, (14) and similar equation for angle of $(y,\dot{y})$ , lead to the following conclusions. If $0\leq\eta<0.5,$ there is a two-dimensional family of periodic solutions that can be indexed by the peak amplitude of $x$ (ranges over $(0,\overline{x}))$ and phase. The period of the solutions increases continuously over $(2\pi/\sqrt{1-4\eta^{2}},\infty)$ as the peak amplitude of $x$ increases over $(0,\overline{x})$ . If $\eta\geq 0.5,$ there are no periodic solutions of (12).

III-C *Infinite horizon convergent transient MFG solutions, and the asymptotically stable

constant solutions*

Consider the initial value problem over $t\in[0,\infty)$ with some initial condition $(x_{0},y_{0})$ and dynamics (12). First, suppose $0\leq\eta<0.5.$ For any initial condition $(x_{0},y_{0})$ such that $x_{0}\neq 0$ , one of four cases holds: $x_{t}$ is periodic with a positive period, $x$ converges to $\overline{P}$ , $x$ converges to $-\overline{P}$ , or $x_{t}$ exits $[-1,1]$ in finite time. The following categorize the convergent solutions such that $x_{t}$ remains in $[-1,1].$

•

For any initial value of $x_{0}\in(-\overline{x},\overline{x})$ , there exist two corresponding initial values of $y_{0}$ such that the solution of the initial value problem satisfies (i) $x_{t}\in[-1,1]$ for all $t$ and (ii) the solution converges to a limit as $t\to\infty.$ For the smaller value of $y_{0}$ the limit is $-\overline{P}$ and for the larger value of $y_{0}$ the limit is $\overline{P}.$ The value of the larger $y_{0}$ for example is such that the contour of $H$ through $(x_{0},y_{0})$ contains $\overline{P}.$

•

For an initial value $x_{0}\in[-1,-\overline{x}]$ there exists a unique value of $y_{0}$ such that the solution of the initial value problem satisfies $x_{t}\in[-1,1]$ for all $t.$ That solution converges to $-\overline{P}$ as $t\to\infty.$

•

Similarly, for an initial value $x_{0}\in[\overline{x},1]$ there exists a unique value of $y_{0}$ such that the solution of the initial value problem satisfies $x_{t}\in[-1,1]$ for all $t.$ That solution converges to $\overline{P}$ as $t\to\infty.$

Second, suppose $\eta\geq 0.5.$ For any $x_{0}\in[-1,1]$ , there is a unique value of $y_{0}$ , such that the solution of the initial value problem for (12) satisfies $x_{t}\in[-1,1]$ for all $t.$ Furthermore, $y_{0}$ has the same sign as $x_{0}$ , and the solution converges to $(0,0)$ as $t\to\infty.$ The value of $y_{0}$ is the root of $H(x_{0},y_{0})=0$ (for $x_{0}$ fixed) that is closer to zero.

The above observations give a sense in which $\pm\overline{P}$ is an asymptotically stable equilibrium point of the dynamics (12) if $0\leq\eta<0.5,$ and $(0,0)$ is an asymptotically stable equilibrium point if $\eta\geq 1/2.$ This sense of stability is not the usual definition of (Lyapunov) stability because we ask, for given $x_{0}$ , whether there exists an associated value of $y_{0}$ giving the desired convergence. The asymptotically stable limit points are saddlepoints of $H.$

As mentioned above, a related definition of stability, called linear asymptotic stability, is formulated in [10]. That definition and the results of [10] for it are translated to the model of this paper in Appendix D.

IV Evidence for Conjecture 1

In order to explore whether Conjecture 1 is true, it is natural to explore two sides of the question. One side is to identify the FLMP trajectories. Numerically that can be done by solving the $2(N+1)$ dimensional HJB equation for the system with $N+1$ players to find the strategy $\alpha^{*}(i,n,t)$ players use for the Markov perfect equilibrium with $N+1$ players, and then either simulating the corresponding occupancy process through Monte Carlo simulation of $N+1$ players independently using that policy, or solving the Kolmogorov forward equations to find the marginal distribution, mean and variance of the number of players in state 0 vs. time.

The other side is to identify the stable fixed points of ${\cal T}.$ Two ways to explore which fixed points of ${\cal T}$ are stable are to either numerically investigate the orbit trajectories as ${\cal T}$ is repeatedly applied to some initial trajectory, or to examine the linearization of ${\cal T}$ about a fixed point–this is the Gateaux derivative and it can be expressed as an integral operator. The eigenvalues can be computed numerically, and in rare cases, analytically. By abuse of notation, we use ${\cal T}$ to denote the mean field map as a mapping $T(x)\mapsto\widetilde{x}$ obtained by the change of coordinates $x=2\theta-1.$

Numerical identification of FLMP trajectories

For the symmetric follow the crowd model, numerical analysis strongly and consistently indicates which MFG solutions are FLMP trajectories. We find that for $\eta\leq 1/2$ they coincide with the unique MFG equilibrium – namely, the (0,0) trajectory over $[0,T].$ And for $\eta>1/2$ there are two FLMP trajectories. Namely, the one that traverses the first quadrant in the x-y plane once, and the negative of it, which traverses the third quadrant in the x-y plane once. In particular, the solutions that wind around the origin through three or more quadrants do not appear to be FLMP solutions. See Fig. 3 for illustration.

For less symmetric examples it is less obvious where the bifurcation curve is that separates FLMP solutions that converge to a point closer to 1, or converge to a point closer to 0. The bifurcation curve often coincides with a line or curve of indifference for the $N+1$ player game with a large number of players, corresponding to upcrossings of zero by the mapping $n\mapsto u_{1}(0,n,t)-u_{0}(0,n,t).$ This is illustrated in Fig. 4.

Examination of orbits of ${\cal T}$

Recall that the fixed points of ${\cal T}$ are the collective mass trajectories $(\theta_{t}:0\leq t\leq{\cal T})$ of mean field Nash equilibria. To numerically investigate the stability of fixed points of ${\cal T}$ we generated sequences of iterates of trajectories $(\theta^{n})_{n\geq 0}$ defined by $\theta^{n+1}={\cal T}(\theta^{n}),$ where the initial point $\theta^{0}$ is a perturbation of a fixed point. Figure 5 shows such sequences of iterates such that the initial trajectory is a perturbation of one of the two MFG Nash equilibria that cross zero one time, for the follow the crowd game and time horizon $T=20.$ In both instances, the iterates converged to one of the two equilibria with no zero crossings.

However, overall we found it difficult to numerically verify that a given solution is not a stable fixed point. On one hand, some MFG solutions that we don’t expect to be stable, such as the trajectory that crosses zero once, numerically appear to be asymptotically stable for a very small basin of stability. On the other hand, we have found perturbations of MGF solutions that also numerically appear to be asymptotically stable, indicating numerical artifacts are possible.

Linearization of ${\cal T}$ about $(0,0)$

Given a fixed point $\bar{x}={\cal T}(\bar{x})$ , the Gateaux derivative $d{\cal T}_{X}(\bar{x},x)$ , or the directional derivative of $\cal T$ at $\bar{x}$ in the direction $x,$ is obtained by linearizing $\cal T$ about $\bar{x}.$ This is particularly simple if $\bar{x}$ is the zero trajectory. (Linearization about a nonzero trajectory is given in Appendix E.) In that case, the linearized MFG equations are:

[TABLE]

Given $(x_{u})$ , $\widetilde{x}=d{\cal T}_{X}(\bar{x},x)={\cal L}_{2}{\cal L}_{1}x$ , where ${\cal L}_{1}$ and ${\cal L}_{2}$ are linear operators defined as:

[TABLE]

These expressions can be combined to yield

[TABLE]

where $K(t,u)=e^{-2\eta(t\vee u)}\sinh(2\eta(t\wedge u))/2\eta$ for $\eta>0$ and $K(t,u)=t\wedge u$ for $\eta=0.$ In other words, the Gateaux derivative is the integral operator with kernel $K.$

If $\eta=0$ , $K(t,u)=t\wedge u,$ which is the covariance of Brownian motion, which has a well known Mercer series expansion. The eigenvalues of $K$ are $\lambda_{n}=\left(\frac{2T}{(2n+1)\pi}\right)^{2}$ with corresponding eigenfunctions $h_{n}(t)=\sin\left(\frac{(2n+1)\pi t}{2T}\right)$ for $n\geq 0.$ In particular, the largest eigenvalue is $\lambda_{0}=\left(\frac{2T}{\pi}\right)^{2},$ and $\lambda_{0}\leq 1$ if and only if $T\leq T_{c}(0)=\pi/2,$ where $T_{c}(\eta)$ is the critical time horizon for the appearance of multiple MFG equilibria.

Here is an upper bound on the maximum eigenvalue of $K$ for $\eta>0.$ The mappings ${\cal L}_{1}$ and ${\cal L}_{2}$ are both bounded operators in the supremum norm: $\|y\|_{\infty}\leq c(\eta,T)\|x\|_{\infty},$ with operator norm $c(\eta,T)=\int_{0}^{T}e^{-2\eta t}dt=\frac{1-e^{-2\eta T}}{2\eta}.$ Thus, the Gateaux derivative is also a bounded operator in the supremum norm with operator bound $c^{2}(\eta,T).$ 333A somewhat tighter bound is given by $\|\widetilde{x}\|_{\infty}\leq\widetilde{c}(\eta,T)\|x\|_{\infty},$ where $\widetilde{c}(\eta,T)=\max_{t}\int_{0}^{T}K(t,s)ds$ , but the expression for $\widetilde{c}(\eta,T)$ is complicated. Hence, if $\eta\geq 1/2$ , the linearized mapping is a contraction in the $L^{\infty}$ norm for all $T>0.$ If $\eta<1/2$ it is a contraction if $T$ is small enough that $\frac{1-e^{-2\eta T}}{2\eta}<1.$

For $\eta>0$ we conjecture the largest eigenvalue of $K$ is greater than one precisely when there is a nonzero MFG equilibrium, namely, when $T>T_{c}(\eta)\triangleq\frac{\pi-\arccos(2\eta)}{\sqrt{1-4\eta^{2}}}.$ We numerically found the largest eigenvalue of the matrix approximation of the kernel, $(K(iT/n,jT/n)T/n)_{i,j\in[n]}$ for $n=10^{3}$ for $\eta\in(0,0.499)$ and $T$ near $T_{c},$ and the calculations match the conjecture well.

Appendix A Proof of Proposition 1

This section proves Proposition 1, that if $\eta>0,$ FLMP trajectories are mean field game equilibria. The proof is given after some initial notation is given and two lemmas are proved. Let $(\theta_{t})_{0\leq t\leq T}$ be an FLMP trajectory and let $(i^{N}(0),n^{N}(0))_{N\geq 1}$ be a corresponding sequence of initial conditions as in the definition of FLMP trajectory. For $N\geq 1$ , let $((i(t),n(t)):0\leq t\leq T)$ denote the controlled Markov process for $N+1$ players resulting for initial state $(i^{N}(0),n^{N}(0)),$ when all players use the unique policy $(\alpha^{*}(i,n,t))$ for the Markov perfect equilibrium for $N+1$ players. Since the functions $f(i,\theta,t)$ and $\psi(i,\theta)$ are bounded, for $T$ fixed, the cost to go functions $u(i,n,t)$ determined by the HJB equations (1)- (34) are uniformly bounded for all $N,i,n,$ and $t\in[0,T].$ Therefore, the policy $\alpha^{*}$ , determined by (3), is also uniformly bounded. Select $\Gamma_{1}$ such that $(\alpha^{*}(i,n,t))\leq\Gamma_{1}$ for all $N,i,n,$ and $t\in[0,T].$ Suppose also that $\Gamma_{1}$ is large enough that $\alpha(i,t)\leq\Gamma_{1}$ for all $i,t$ for any decentralized policy $\alpha(i,t)$ resulting by responding to a deterministic collective mass trajectory.

Consider the following variation of the Markov perfect equilibrium. Suppose the reference player switches from using $\alpha^{*}$ to some other policy, $\beta^{*}(i,n,t),$ such that $\beta^{*}(i,n,t)\leq\Gamma_{1}$ and $t\mapsto\beta^{*}(i,n,t)$ is continuous for all $(i,n).$ Let $P$ denote the original probability distribution for the process $(i(t),n(t))_{0\leq t\leq T}$ and let $\widetilde{P}$ denote the probability distribution of $(i(t),n(t))_{0\leq t\leq T}$ when the reference player switches to policy $\beta^{*}.$

Lemma 1.

(Insensitivity of FLMP trajectory to one player switching policies) The following holds for any $\epsilon>0$ ,

[TABLE]

Lemma 2.

Let $P$ and $\widetilde{P}$ be probability distributions on the same measurable space $(\Omega,{\cal F})$ such that $\widetilde{P}<<P$ (i.e. $\widetilde{P}$ is absolutely continuous with respect to $P$ ) and let $\frac{d\widetilde{P}}{dP}$ denote the Radon-Nikodym derivative. Suppose $E_{P}\left[\left(\frac{d\widetilde{P}}{dP}\right)^{p}\right]^{1/p}\leq c$ for some $p>1$ and $c.$ Let $q>1$ be such that $\frac{1}{p}+\frac{1}{q}=1.$ Then for any event $A$ , $\widetilde{P}(A)\leq cP(A)^{1/q}.$

Proof of Lemma 2.

By Hölder’s inequality,

[TABLE]

∎

Proof of Lemma 1 .

Since $P$ and $\widetilde{P}$ only differ by the change in the policy for player 1, the Radon-Nikodym derivative $\frac{d\widetilde{P}}{dP}$ can be written explicitly as follows. Let $(Y_{t})_{0\leq t\leq T}$ denote the number of jumps of the state of the reference player during $[0,t].$ Then, by standard theory of change of probability measure for point processes (Girsanov type result for point processes, see [7], Theorem 4.1 for example), $\widetilde{P}<<P$ and the Radon-Nikodym derivative is given by

[TABLE]

where $\beta^{*}$ is short for $\beta^{*}(i(t-),n(t-),t),$ $\alpha^{*}$ is short for $\alpha^{*}(i(t-),n(-),t)$ and $\eta$ is the fixed positive background jump rate.

Note that for $p>1$ , the expression for the Radon-Nikodym derivative to the $p^{th}$ power can be written as a product

[TABLE]

where $\widetilde{\widetilde{P}}$ is a probability measure corresponding to a similar Radon-Nikodym derivative with a factor $p$ in front of the log term, and $\Gamma_{2}=\exp\left[T\left((\Gamma_{1}+\eta)^{p}+p\Gamma_{1}\right)\right].$ Thus, $E_{P}\left[\left(\frac{d\widetilde{P}}{dP}\right)^{p}\right]\leq\Gamma_{2}.$ Lemma 1 thus follows from Lemma 2 with $A$ equal to the complement of the event in (22). ∎

Proof of Proposition 1.

Consider the Markov perfect equilibrium for large $N.$ In view of Lemma 1, if the reference player deviates from using $\alpha^{*}$ , the normalized process $n(t)/N$ for the rest of the population still follows $\theta$ arbitrarily closely as $n\to\infty.$ Thus, an asymptotically optimal policy for the reference player to switch to is the optimal response to deterministic collective mass trajectory $\theta.$ Furthermore, it implies $u(n,i,t)-u(i,t)$ converges to zero uniformly in $n$ and $t\in[0,T]$ , where $u(n,i,t)$ is associated with the $N+1$ player MP equilibrium, and $u(i,t)$ is the cost-to-go for the single reference player responding to the deterministic mass trajectory $\theta.$ It follows that all players in the $N+1$ game are asymptotically effectively using the same policy as the alternate policy of the reference player. (in other words, $(u(i,n,t)-u(1-i,n,t))_{+}\approx(u(i,t)-u(1-i,t))_{+}).$ Thus, the corresponding fluid limit is the same as the mean limit for the reference player with random initial state equal to 0 with probability $n^{M}(0)/n.$ ∎

Appendix B The uniform law of large numbers

Theorem 7.4 of [2] is repeated here for convenience.

Proposition 2.

Let $(X_{t}:0\leq t\leq T)$ be a centered, stochastically continuous uniformly bounded random process whose trajectories are right continuous and have left limits. Assume for some $c>0$ , some nondecreasing function $F\in D[0,1]$ and for all $s,t\in[0,1],$ $E[|X_{t}-X_{s}|\leq|F(t)-F(s)|.$ Then $X\in CLT$ in $(D([0,1],\|\cdot\|_{\infty}).$

An implication of this theorem is that if all players use the same decentralized policy $\alpha(i,t)$ (assumed to be bounded and measurable in $t$ ) and if the initial conditions satisfy $\frac{n(0)}{N}\to\overline{\theta}$ for some $\overline{\theta}\in[0,1],$ then as $n\to\infty,$ the population average converges to $(\theta_{t})$ in probability in the supremum norm, where $(\theta_{t})$ is determined by the Kolmogorov forward equation

[TABLE]

Therefore, the following are equivalent for a trajectory $(\theta_{t})$ :

(a)

Let $\alpha^{*}$ denote the optimal response policy for a single player in response to $\theta.$ In other words, $\alpha(i,t)=(u(i,t)-u(1-i,t))_{+}$ where $(u(i,t))$ is determined by (7)- (8). Then for any $\epsilon>0$ and any sequence of finite player games with $\frac{n(0)}{N}\to\theta_{0}$ , the strategy profile of all players using $\alpha^{*}$ is an $\epsilon$ -Nash equilibrium for sufficiently large $N$ . 2. (b)

$(\theta_{t})$ is the population trajectory of a MFG equilibrium.

Appendix C Monotonicity of period with amplitude

Consider the follow the crowd dynamics (12), rewritten here for convenience:

[TABLE]

From (12) we find for all $x,y,$

[TABLE]

Equivalently, writing $v=\dot{y}$ , yields

[TABLE]

The motion (25) admits the Hamiltonian $H(y,v)=\frac{1}{2}v^{2}-\frac{1}{8}y^{4}-\eta|y|^{3}-\frac{4\eta^{2}-1}{2}y^{2}.$ If $\eta<1/2$ then $H(y,v)$ is convex near the origin. Letting $\varphi=\arctan\left(\frac{y}{v}\right)$ we find

[TABLE]

Note that $\dot{\varphi}$ is increasing in $|y|$ for any fixed ratio of $v$ to $y$ (decreasing angular speed). Hence, the period of the periodic trajectories increases with amplitude.

Appendix D Linear asymptotic stability for symmetric follow the crowd example

A definition of linear asymptotic stability was introduced in ([10], Section IV) for a constant in time solution to the infinite horizon long term average cost mean field game. We translate that definition to our setting. Roughly speaking, linear asymptotic stability is a variation, based on linearization, of the asymptotic stability properties delineated in Section III-C.

Definition 1.

Suppose $(\widehat{x},\widehat{y})$ is an equilibrium point for the ode (12). Seeking solutions of the form $(\widetilde{x},\widetilde{y})=(\widehat{x},\widehat{y})+\epsilon(x,y)+o(\epsilon)$ , we obtain a linear initial value problem for $(x,y)$ by linearizing (12) about $(\widehat{x},\widehat{y}).$ The point $(\widehat{x},\widehat{y})$ is said to be linearly asymptotically stable if for any initial perturbation $x_{0}\in{\mathbb{R}},$ there exists a unique solution $(x_{t},y_{t})_{t\geq 0}$ to the linearized equations (with the given initial condition for $x$ , some initial condition for $y$ , and satisfying the $L^{2}$ constraint $\int_{0}^{\infty}\|x_{s}-\widehat{x}\|^{2}ds<\infty$ ) and, furthermore, $\lim_{t\to\infty}x_{t}=\widehat{x}.$

With the definition in place we prove the following proposition.

Proposition 3.

The equilibrium point $(0,0)$ is linearly asymptotically stable if and only if $\eta>\eta_{c}=1/2.$ The equilibrium points $\pm\widehat{P}$ are linearly asymptotically stable if and only if $0\leq\eta<1/2.$

Proof.

For an equilibrium point $(\widehat{x},\widehat{y})$ , we have $H_{x}(\widehat{x},\widehat{y})=H_{y}(\widehat{x},\widehat{y})=0$ and

[TABLE]

So the linear initial value problem for $(x,y)$ can be written as

[TABLE]

where

[TABLE]

Since $\operatorname{Tr}(A)=0$ (so sum of eigenvalues is zero) and $\det(A)=\det({\cal H}(\widehat{x},\widehat{y}))$ where ${\cal H}$ is the Hessian of $H$ :

[TABLE]

the eigenvalues of $A$ are $\pm\sqrt{-\det({\cal H}(\widehat{x},\widehat{y}))}.$ If $\det({\cal H})<0$ then the eigenvalues are real valued and one is negative. If $\det({\cal H})>0$ the eigenvalues are purely imaginary. For the follow the crowd game with Hamiltonian given in 13,

[TABLE]

Consider first the zero equilibrium, $(\widehat{x},\widehat{y})=(0,0),$ in which case $A=\left(\begin{array}[]{cc}-2\eta&1\\ -1&2\eta\end{array}\right).$ This $A$ has eigenvalues $\pm\sqrt{4\eta^{2}-1}$ and, for $\eta\geq 0.5$ , corresponding eigenvectors $\binom{1}{2\eta\pm\sqrt{4\eta^{2}-1}}.$ If $\eta>0.5$ then the solutions to (26) have the following form, for some constants $a$ and $b$ ,

[TABLE]

The initial condition for $x$ and the $L^{2}$ constraint are satisfied if and only if $a=0$ and $b=x_{0},$ and the resulting solution converges to zero as $t\to\infty.$ Hence, the system is linearly asymptotically stable if $\eta>1/2.$ If $\eta<1/2$ then the two eigenvalues of $A$ are purely imaginary, nonzero, and negatives of each other, so that all nonzero solutions to (26) are periodic. If $\eta=1/2$ all solutions have $x$ of the form $x_{t}=a+bt.$ So, combining the observations for $\eta<1/2$ and $\eta=0.5$ , we conclude that for $\eta\leq 1/2$ there are no nonzero solutions satisfying the $L^{2}$ constraint. So for $\eta\geq 1/2$ the zero equilibrium is not linearly asymptotically stable.

Now consider the equilibrium point $\pm\widehat{P}$ and suppose $\eta<1/2.$ Then, since $|\widehat{y}|=\sqrt{2+\eta^{2}}-3\eta,$ we find that $2\eta+|\widehat{y}|=\sqrt{2+\eta^{2}}-\eta>1.$ Therefore, by the analysis for the zero equilibrium point with $2\eta$ replaced by $2\eta+|y|$ , we see that again $A$ has two real-valued eigenvalues of opposite sign, so the system is linearly asymptotically stable. ∎

Note that the eigenvalues $\pm\sqrt{4\eta^{2}-1}$ for the equilibrium point $(0,0)$ have qualitatively the same graph as in Figure 2(b) of [10], with $R$ and $R_{c}$ replaced by $\eta$ and $\eta_{c}.$

Remark 1.

Proposition 3 illustrates the notion of linear asymptotic stability for equilibrium points of the infinite horizon average cost mean field game, introduced in [10]. The two state Markov control problem we have considered is considerably simpler than the coupled oscillator problem considered in [10], so, as explained in Section III-C, we could observe asymptotic stability properties of equilibrium points directly, rather than considering the linearized dynamics.

Appendix E Kernel for Gateaux derivative of ${\cal T}$ for nonzero $x.$

We give an expression for the kernel of the Gateaux derivative along a nonzero $x$ trajectory in case $\eta=0$ for follow the crowd cost function. Given $x$ , $\widetilde{x}=T(x)$ is found by first finding $y$ :

[TABLE]

and then $\widetilde{x}:$

[TABLE]

Fix $\widehat{t}\in(0,T)$ and $\epsilon>0$ sufficiently small. Suppose $h(t)=\delta(t-\widehat{t})$ . Let $x_{\epsilon}=x+\epsilon h$ , $y_{\epsilon}=y+\epsilon k+o(\epsilon)$ and $\widetilde{x}_{\epsilon}=x+\epsilon g+o(\epsilon).$ Let $y,y_{\epsilon}$ be the solution to (27) with the $x,x_{\epsilon}:[0,T]\to[-1,1]$ respectively. Then, linearizing the equations for $y$ and $\widetilde{x}$ yields

[TABLE]

so that

[TABLE]

and the kernel of ${\cal T}$ is thus given by

[TABLE]

If $y\geq 0$ over $[0,T]$ then $(1-\mathrm{sgn}(y_{t})x_{t})=e^{-\int_{0}^{t}|y|dr},$ yielding:

[TABLE]

Appendix F Avoid the crowd cost function

In contrast to the follow the crowd game focused on in this paper, the MFG equilibrium for the avoid the crowd game of this section has a unique solution. Suppose the cost per time spent in state $i$ is

[TABLE]

where $\theta$ is the fraction of other players in state 0.

The reduced dimension MFG equations become

[TABLE]

with associated Hamiltonian

[TABLE]

Contour maps of $H$ are shown in Fig. 6 for two values of $\eta.$

We observe that $(0,0)$ is the unique critical point of $H,$ and for any $x_{0}\in[-1,1]$ there exists a unique value of $y_{0}$ such that the solution of the initial value problem with dynamics (31) over $[0,\infty)$ is such that $x_{t}\in[-1,1]$ for all $t.$ Furthermore, such solution converges to $(0,0)$ . Also, $\det\mbox{Hess}H(0,0)=-1-4\eta^{2}<0,$ and the unique equilibrium point $(0,0)$ of the infinite horizon average cost MFG is linearly asymptotically stable.

Appendix G On the difference of cost to go for $N+1$ players

Recall that using $y_{t}$ defined by $y_{t}=u(1,t)-u(0,t)$ yielded a reduction from three to two dimensions in the MFG equilibrium equations. Let us see if a similar reduction occurs for the Nash equilibrium equations for the $N+1$ player game. For convenience we restate the HJB cost-to-go equations (1) and (3) for the reference player in the $N+1$ player game:

[TABLE]

where the corresponding control policy is

[TABLE]

Suppose all players use policy $\alpha^{*}$ , so $\beta=\alpha^{*}$ in the definition of $\gamma^{\pm}.$ Let $Y(n,t)=u(1,n,t)-u(0,n,t),$ $\delta f(n)=f(1,n)-f(0,n),$ and $\delta\psi(n)=\psi(1,n)-\psi(0,n).$ Using the facts

[TABLE]

in (33) yields

[TABLE]

The RHS is not a function of $Y$ alone. However, using $Y(n,t)_{+}\approx Y(n+1,t)_{+}$ and $(-Y(n-1,t))_{+}\approx(-Y(n,t))_{+}$ yields the approximation:

[TABLE]

Appendix H The MFG partial differential equation

An interpretation of a mean field game Nash equilibrium $(u(i,t),\theta_{t})$ is that at each time $t$ , $u(i,t)$ is the cost to go for a reference player in state $i$ , given that the fraction of players in state 0 is $\theta_{t}.$ That picture can be embedded into a larger picture. Bt taking a limit of the HJB equations for $N+1$ players as $N\to\infty,$ we can derive a PDE for $(U(i,\theta,t))$ such that $U(i,\theta,t)$ is the cost-to-go for a reference player in state $i$ given that the fraction of players in state 0 is $\theta$ for any $\theta\in[0,1].$ This idea is described in [3] (see Proposition 8) and is attributed there to P. Lions. For simplicity, we derive the PDE for the avoid the crowd game and use the equations derived in Section G. We use notation $Y$ instead of $U$ and $x$ instead of $\theta.$

Equations (36)-(37) suggest the following PDE, where now $n$ is treated as a continuous variable over $[0,N]$ rather than as an integer variable.

[TABLE]

Note that if we let $(\widehat{n}_{t},0\leq t\leq T)$ be defined by the following initial value problem

[TABLE]

then by the chain rule and the PDE (38),

[TABLE]

Note that if we set $y_{t}=Y(\widehat{n},t)$ and $x_{t}=\frac{2\widehat{n}_{t}}{N}-1$ and consider the join-the-crowd cost function (so $f(\widehat{n})=\frac{2n-N}{N}$ ), then (40) and (41) are equivalent to the MFG equations (12). This calculation is an instance of Proposition 8 of [3]. Figure 8 gives numerical evidence that $u(1,n,t)-u(0,n,t)$ , with $n$ normalized to $\theta\in[0,1]$ , converges as $n\to\infty.$ Presumably the limit is the solution of the PDE.

The PDE (38)- (39) is a first order hyperbolic type. Equation (40) defines a characteristic curve for the PDE, which is why the PDE along the curve reduces to an ODE. The fact there are multiple MFG solutions indicates that solutions of the PDE are also not unique. The problem of identifying which MFG Nash equilibria are FLMP trajectories therefore can be extended to the problem of determining which solutions of the PDE are limits of the scaled cost-to-go functions $((u(i,n,t)).$

Bibliography10

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Altman, K. Avrachenkov, N. Bonneau, M. Debbah, R. El-Azouzi, and D. S. Menasche. Constrained cost-coupled stochastic games with independent state processes”. Oper. Res. Lett. , 36(2):160–164, 2008.
2[2] E. Giné and J. Zinn. Some limit theorems for empirical processes. The Annals of Probability , 12:929–989, 1984.
3[3] Diogo A Gomes, Joana Mohr, and Rafael Rigão Souza. Continuous time finite state mean field games. Applied Mathematics & Optimization , 68(1):99–143, 2013.
4[4] Minyi Huang, Peter E Caines, and Roland P Malhamé. Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε 𝜀 \varepsilon -Nash equilibria. IEEE Transactions on Automatic Control , 52(9):1560–1571, 2007.
5[5] J.-M. Lasry and P.-L. Lions. Mean field games. Japanese Journal of Mathematics , 2(1):229–260, 2007.
6[6] Maskin and Tirole. A theory of dynamic oligopoly, I and II. Econometrica , 56:549–570, 1984.
7[7] Jan H. Van Schuppen and Eugene Wong. Transformation of local martingales under a change of law. The Annals of Probability , 2(5):879–888, 1974.
8[8] H. Tembine, J.-Y. Le Boudec, R. El-Azouzi, and E. Altman. Mean field asymptotics of markov decision evolutionary games and teams. In Proc. 1st ICST Int. Conf. Game Theory for Netw , pages 140–150, 2009.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On non-unique solutions in mean field games

Abstract

I Introduction and related work

II Problem formulation

Hamilton Jacobi Bellman (HJB) equation for N+1N+1N+1 player system

Mean field game equilibria and map

Fluid limits of Markov perfect equilibrium

Definition II.1**.**

Proposition 1**.**

Contributions of the paper

Conjecture 1**.**

III MFG equilibria for follow the crowd

III-A Finite time horizon mean MFG solutions

III-B Infinite horizon constant or periodic MFG solutions

III-C *Infinite horizon convergent transient MFG solutions, and the asymptotically stable

IV Evidence for Conjecture 1

Numerical identification of FLMP trajectories

Examination of orbits of T{\cal T}T

Linearization of T{\cal T}T about (0,0)(0,0)(0,0)

Appendix A Proof of Proposition 1

Lemma 1**.**

Lemma 2**.**

Proof of Lemma 2.

Proof of Lemma 1 .

Proof of Proposition 1.

Appendix B The uniform law of large numbers

Proposition 2**.**

Appendix C Monotonicity of period with amplitude

Appendix D Linear asymptotic stability for symmetric follow the crowd example

Definition 1**.**

Proposition 3**.**

Proof.

Remark 1**.**

Appendix E Kernel for Gateaux derivative of T{\cal T}T for nonzero x.x.x.

Appendix F Avoid the crowd cost function

Appendix G On the difference of cost to go for N+1N+1N+1 players

Appendix H The MFG partial differential equation

Hamilton Jacobi Bellman (HJB) equation for $N+1$ player system

Definition II.1.

Proposition 1.

Conjecture 1.

Examination of orbits of ${\cal T}$

Linearization of ${\cal T}$ about $(0,0)$

Lemma 1.

Lemma 2.

Proposition 2.

Definition 1.

Proposition 3.

Remark 1.

Appendix E Kernel for Gateaux derivative of ${\cal T}$ for nonzero $x.$

Appendix G On the difference of cost to go for $N+1$ players