Approximate Nash Equilibria in Partially Observed Stochastic Games with   Mean-Field Interactions

Naci Saldi; Tamer Basar; and Maxim Raginsky

arXiv:1705.02036·cs.SY·June 6, 2018

Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions

Naci Saldi, Tamer Basar, and Maxim Raginsky

PDF

TL;DR

This paper proves the existence of Nash equilibria in partially observed mean-field games and demonstrates that these equilibria serve as good approximations in large-agent settings.

Contribution

It introduces a method to establish Nash equilibria in partially observed mean-field games using belief space transformation and dynamic programming.

Findings

01

Existence of Nash equilibria under mild conditions.

02

Mean-field equilibrium policies are approximate Nash equilibria in large-agent games.

03

Applicable to infinite-horizon discounted cost scenarios.

Abstract

Establishing the existence of Nash equilibria for partially observed stochastic dynamic games is known to be quite challenging, with the difficulties stemming from the noisy nature of the measurements available to individual players (agents) and the decentralized nature of this information. When the number of players is sufficiently large and the interactions among agents is of the mean-field type, one way to overcome this challenge is to investigate the infinite-population limit of the problem, which leads to a mean-field game. In this paper, we consider discrete-time partially observed mean-field games with infinite-horizon discounted cost criteria. Using the technique of converting the original partially observed stochastic control problem to a fully observed one on the belief space and the dynamic programming principle, we establish the existence of Nash equilibria for these game…

Equations416

e_{t}^{(N)} (\cdot) : = \frac{1}{N} i = 1 \sum N δ_{x_{i}^{N} (t)} (\cdot) \in P (X)

e_{t}^{(N)} (\cdot) : = \frac{1}{N} i = 1 \sum N δ_{x_{i}^{N} (t)} (\cdot) \in P (X)

\displaystyle\prod^{N}_{i=1}r\big{(}dy^{N}_{i}(t)\big{|}x^{N}_{i}(t),e^{(N)}_{t}\big{)},

\displaystyle\prod^{N}_{i=1}r\big{(}dy^{N}_{i}(t)\big{|}x^{N}_{i}(t),e^{(N)}_{t}\big{)},

\displaystyle\prod^{N}_{i=1}p\big{(}dx^{N}_{i}(t+1)\big{|}x^{N}_{i}(t),a^{N}_{i}(t),e^{(N)}_{t}\big{)},

\displaystyle\prod^{N}_{i=1}\pi^{i}_{t}\big{(}da^{N}_{i}(t)\big{|}h^{N}_{i}(t)\big{)},

\displaystyle\prod^{N}_{i=1}\pi^{i}_{t}\big{(}da^{N}_{i}(t)\big{|}h^{N}_{i}(t)\big{)},

J_{i}^{(N)} (π^{(N)})

J_{i}^{(N)} (π^{(N)})

J_{i}^{(N)} (π^{(N *)}) = π^{i} \in Π_{i} in f J_{i}^{(N)} (π_{- i}^{(N *)}, π^{i})

J_{i}^{(N)} (π^{(N *)}) = π^{i} \in Π_{i} in f J_{i}^{(N)} (π_{- i}^{(N *)}, π^{i})

J_{i}^{(N)} (π^{(N *)})

J_{i}^{(N)} (π^{(N *)})

J_{i}^{(N)} (π^{(N *)})

J_{i}^{(N)} (π^{(N *)})

\displaystyle\bigl{(}{\mathsf{X}},{\mathsf{A}},{\mathsf{Y}},p,r,c,\mu_{0}\bigr{)},

\displaystyle\bigl{(}{\mathsf{X}},{\mathsf{A}},{\mathsf{Y}},p,r,c,\mu_{0}\bigr{)},

x (0)

x (0)

y (t)

x (t)

a (t)

J_{μ} (π^{*}) = π \in Π in f J_{μ} (π),

J_{μ} (π^{*}) = π \in Π in f J_{μ} (π),

J_{μ} (π)

J_{μ} (π)

Ψ : M \to 2^{Π}

Ψ : M \to 2^{Π}

Λ : Π \to M

Λ : Π \to M

μ_{t + 1} (\cdot) = \int_{X \times A} p (\cdot ∣ x (t), a (t), μ_{t}) P^{π} (d a (t) ∣ x (t)) μ_{t} (d x (t)),

μ_{t + 1} (\cdot) = \int_{X \times A} p (\cdot ∣ x (t), a (t), μ_{t}) P^{π} (d a (t) ∣ x (t)) μ_{t} (d x (t)),

(a, μ) \in A \times P (X) sup \int_{X} w (y) p (d y ∣ x, a, μ) \leq α w (x) .

(a, μ) \in A \times P (X) sup \int_{X} w (y) p (d y ∣ x, a, μ) \leq α w (x) .

\int_{X} w (x) μ_{0} (d x) = : M < \infty.

\int_{X} w (x) μ_{0} (d x) = : M < \infty.

z (t) : = Pr {x (t) \in \cdot ∣ y (0), \dots, y (t), a (0), \dots, a (t - 1)} \in P (X) .

z (t) : = Pr {x (t) \in \cdot ∣ y (0), \dots, y (t), a (0), \dots, a (t - 1)} \in P (X) .

R_{t} (x \in A, y \in B ∣ z, a) : = \int_{X} κ_{t} (A, B ∣ x^{'}, a) z (d x^{'}),

R_{t} (x \in A, y \in B ∣ z, a) : = \int_{X} κ_{t} (A, B ∣ x^{'}, a) z (d x^{'}),

R_{t} (d x, d y ∣ z, a) = H_{t} (d y ∣ z, a) \otimes F_{t} (d x ∣ z, a, y) .

R_{t} (d x, d y ∣ z, a) = H_{t} (d y ∣ z, a) \otimes F_{t} (d x ∣ z, a, y) .

F_{t} (z, a, y) (\cdot) = F_{t} (\cdot ∣ z, a, y) .

F_{t} (z, a, y) (\cdot) = F_{t} (\cdot ∣ z, a, y) .

F_{t} (z, a, y) (\cdot)

F_{t} (z, a, y) (\cdot)

H_{t} (\cdot ∣ z, a)

η_{t} (\cdot ∣ z (t), a (t)) = \int_{Y} δ_{F_{t} (z (t), a (t), y (t + 1))} (\cdot) H_{t} (d y (t + 1) ∣ z (t), a (t)) .

η_{t} (\cdot ∣ z (t), a (t)) = \int_{Y} δ_{F_{t} (z (t), a (t), y (t + 1))} (\cdot) H_{t} (d y (t + 1) ∣ z (t), a (t)) .

C_{t} (z, a) : = \int_{X} c (x, a, μ_{t}) z (d x) .

C_{t} (z, a) : = \int_{X} c (x, a, μ_{t}) z (d x) .

\displaystyle\bigl{(}{\mathsf{Z}},{\mathsf{A}},\{\eta_{t}\}_{t\geq 0},\{C_{t}\}_{t\geq 0},\delta_{\mu_{0}}\bigr{)}.

\displaystyle\bigl{(}{\mathsf{Z}},{\mathsf{A}},\{\eta_{t}\}_{t\geq 0},\{C_{t}\}_{t\geq 0},\delta_{\mu_{0}}\bigr{)}.

π_{t}^{φ} (\cdot ∣ g (t)) : = φ_{t} (\cdot ∣ i (g (t))) .

π_{t}^{φ} (\cdot ∣ g (t)) : = φ_{t} (\cdot ∣ i (g (t))) .

φ \in Φ in f \tilde{J} (φ, μ_{0})

φ \in Φ in f \tilde{J} (φ, μ_{0})

W (z) = \int_{X} w (x) z (d x) .

W (z) = \int_{X} w (x) z (d x) .

n \to \infty lim z \in K_{n}^{c} in f W (z) = \infty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\NatBibNumeric\TheoremsNumberedBySection\EquationsNumberedBySection

\RUNAUTHOR

Saldi, Başar, and Raginsky \RUNTITLEPartially Observed Stochastic Games with Mean-Field Interactions

\TITLE

Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions

\ARTICLEAUTHORS\AUTHOR

Naci Saldi, Tamer Başar, and Maxim Raginsky \AFFCoordinated Science Laboratory, University of Illinois,Urbana, IL 61801-2307, USA.

{nsaldi,basar1,[email protected]}

\ARTICLEAUTHORS\AUTHOR

Naci Saldi \AFFDepartment of Natural and Mathematical Sciences, Ozyegin University, Cekmekoy, Istanbul, Turkey.

{[email protected]} \AUTHORTamer Başar, and Maxim Raginsky \AFFCoordinated Science Laboratory, University of Illinois,Urbana, IL 61801-2307, USA.

{basar1,[email protected]}

\ABSTRACT

Establishing the existence of Nash equilibria for partially observed stochastic dynamic games is known to be quite challenging, with the difficulties stemming from the noisy nature of the measurements available to individual players (agents) and the decentralized nature of this information. When the number of players is sufficiently large and the interactions among agents is of the mean-field type, one way to overcome this challenge is to investigate the infinite-population limit of the problem, which leads to a mean-field game. In this paper, we consider discrete-time partially observed mean-field games with infinite-horizon discounted cost criteria. Using the technique of converting the original partially observed stochastic control problem to a fully observed one on the belief space and the dynamic programming principle, we establish the existence of Nash equilibria for these game models under very mild technical conditions. Then, we show that the mean-field equilibrium policy, when adopted by each agent, forms an approximate Nash equilibrium for games with sufficiently many agents.

\KEYWORDS

Mean-field games, approximate Nash equilibrium, partially observed stochastic control. \MSCCLASS91A15, 91A13, 90C40, 90C39, 60J05 \ORMSCLASSPrimary: Games/group decisions, dynamic programming/optimal control, probability ; secondary: Stochastic, Markov, Markov processes

1 Introduction

In this paper, we consider discrete-time mean-field games with decentralized partial observation under infinite-horizon discounted-cost optimality criteria. This type of game models arise as the infinite population limit of finite-agent dynamic games of the mean-field type; that is, the interactions among agents are modeled by the mean-field term (i.e., the empirical distribution of their states), which affects both the agents’ individual costs, and their state and observation transition probabilities. Letting the number of agents go to infinity, the mean-field term converges to the distribution of a single generic agent. Hence, in the limiting case, a generic agent is faced with a single-agent stochastic control problem with a constraint on the distribution of the state at each time (i.e., mean-field game problem). The main goal in this class of problems is to establish the existence of a policy and a state distribution flow such that when the generic agent applies this policy, the resulting distribution of agent’s state is same as the state distribution flow. This last property is called the Nash certainty equivalence (NCE) principle (Huang et al. [28]). The purpose of this paper is to study the existence of such an equilibrium for a general class of mean-field game models with discounted-cost criteria under decentralized partial observation and to establish that the policy in the mean-field equilibrium constitutes a nearly Nash equilibrium for finite-agent games with sufficiently many agents.

Mean-field games have been introduced by Huang et al. [28] and Lasry and Lions [32] around the same time to establish the existence of approximate Nash equilibria for fully-observed non-cooperative differential games with a large number of identical agents. The main feature of this approach is to reduce the decentralized game problem to a centralized stochastic decision problem using the NCE principle. The equilibrium solution of this decision problem provides an almost Nash equilibrium when the number of agents is sufficiently large. Characterization of the solution entails a Fokker-Planck equation evolving forward in time and a Hamilton-Jacobi-Bellman equation evolving backward in time. We refer the reader to Huang et al. [27], Tembine et al. [41], Huang [25], Bensoussan et al. [3], Cardaliaguet [10], Carmona and Delarue [11], Gomes and Saúde [20], Moon and Başar [33] for studies of fully-observed continuous-time mean-field games with different models and cost functions, such as games with major-minor players, risk-sensitive games, games with Markov jump parameters, and LQG games.

In the literature relatively few results are available on partially-observed mean-field games. Indeed, our work appears to be the first one that studies discrete-time mean-field games under partial observations. Existing works have mostly studied the continuous-time setup, and analyses of continuous-time and discrete-time setups are quite different. Huang et al. [26] study a partially-observed continuous-time mean-field game with linear individual dynamics. Şen and Caines [13, 14] consider a continuous-time mean-field game with major-minor agents and nonlinear dynamics where the minor agents can partially observe the state of the major agent. Şen and Caines [16, 15] also develop a nonlinear filtering theory for McKean-Vlasov type stochastic differential equations that arise as the infinite population limit of the partially-observed differential game of the mean-field type. The nonlinear filtering equation is derived using Itô’s lemma for Banach space valued stochastic processes. Tang and Meng [40] study a continuous-time partially observed stochastic control problem of the mean-field type and establish a maximum principle to characterize the optimal control. Huang and Wang [24] consider a continuous-time mean-field game with linear individual dynamics where two types of partial information structure are considered: (i) agents cannot observe the white-noise which is common to all agents, (ii) agents can access the additive white-noise version of their own states.

The class of discrete-time mean-field games we consider in this paper are defined on a Polish state space and with infinite-horizon discounted-cost optimality criteria for the players, who have access to decentralized partial observation on their individual states. In such games, a generic agent is faced with a partially observed stochastic control problem under the NCE principle, which, as indicated earlier, states that the state distribution flow under an optimal decision rule should be the same as the mean-field term that appears in the state and observation dynamics as well as in the individual cost functions. In accordance with this, the classical techniques used to study partially observed optimal control problems are not sufficient to analyze mean-field games. To establish the existence of an equilibrium solution, we have to bring in the fixed-point approach that is used to obtain equilibria in classical game problems, along with the technique of converting partially observed optimal control problems to fully observed ones on the belief space and then employing the dynamic programming principle. The precise descriptions of the mean-field game and the finite-agent game problems are given in Sections 3 and 2, respectively. In Section 4 we prove the existence of a mean-field equilibrium. In Section 5 we establish that the mean-field equilibrium policy is approximately Nash for finite-agent games with sufficiently many agents. In Section 6 we illustrate our results by considering an example.

In Saldi et al. [38] (see also the abridged conference version [39]) we solved the fully-observed version of this problem under a similar set of assumptions on the system components. The techniques used in this paper to establish the existence of a mean field and an approximate Nash equilibrium are almost the same as in Saldi et al. [38] modulo some transformations of the original problems into equivalent ones for which we can still use the techniques in Saldi et al. [38]. However, as a result of these transformations, there are highly non-trivial differences between the proofs in this paper and in Saldi et al. [38]. For instance, as a result of fully observed reduction of partially observed optimal control problem in the mean field game, the dependence of the state transition probability of the fully observed model on the mean-field term is not explicit as in Saldi et al. [38]. Therefore, it is quite challenging to prove the weak continuity of this transition probability with respect to the mean-field term, which is a very crucial result in order to establish the existence of a mean-field equilibrium.

Notation. For a metric space ${\mathsf{E}}$ , we let $C_{b}({\mathsf{E}})$ denote the set of all bounded continuous real functions on ${\mathsf{E}}$ , and ${\mathcal{P}}({\mathsf{E}})$ denote the set of all Borel probability measures on ${\mathsf{E}}$ . For any ${\mathsf{E}}$ -valued random element $x$ , ${\cal L}(x)(\,\cdot\,)\in{\mathcal{P}}({\mathsf{E}})$ denotes the distribution of $x$ . A sequence $\{\mu_{n}\}$ of measures on ${\mathsf{E}}$ is said to converge weakly to a measure $\mu$ if $\int_{{\mathsf{E}}}g(e)\mu_{n}(de)\rightarrow\int_{{\mathsf{E}}}g(e)\mu(de)$ for all $g\in C_{b}({\mathsf{E}})$ . For any $\nu\in{\mathcal{P}}({\mathsf{E}})$ and measurable real function $g$ on ${\mathsf{E}}$ , we define $\nu(g)\coloneqq\int gd\nu$ . For any subset $B$ of ${\mathsf{E}}$ , we let $\partial B$ and $B^{c}$ denote the boundary and complement of $B$ , respectively. The notation $v\sim\nu$ means that the random element $v$ has distribution $\nu$ . Unless otherwise specified, the term “measurable” will refer to Borel measurability.

2 Finite Player Game

We consider a discrete-time partially observed $N$ -agent stochastic game of mean-field type with a Polish state space ${\mathsf{X}}$ , a Polish action space ${\mathsf{A}}$ , and a Polish observation space ${\mathsf{Y}}$ . For every $t\in\{0,1,2,\ldots\}$ and every $i\in\{1,2,\ldots,N\}$ , let $x^{N}_{i}(t)\in{\mathsf{X}}$ , $a^{N}_{i}(t)\in{\mathsf{A}}$ , and $y^{N}_{i}(t)\in{\mathsf{Y}}$ denote the state, the action, and the observation of Agent $i$ at time $t$ , and let

[TABLE]

denote the empirical distribution of the states of agents at time $t$ , where $\delta_{x}\in{\mathcal{P}}({\mathsf{X}})$ is the Dirac measure at $x$ . The initial states $x^{N}_{i}(0)$ are independent and identically distributed according to $\mu_{0}$ . For each $t\geq 0$ , the current-observations $(y^{N}_{1}(t),\ldots,y^{N}_{N}(t))$ and the next-states $(x^{N}_{1}(t+1),\ldots,x^{N}_{N}(t+1))$ of agents are obtained randomly according to the conditional probability laws

[TABLE]

where the stochastic kernel $p:{\mathsf{X}}\times{\mathsf{A}}\times{\mathcal{P}}({\mathsf{X}})\to{\mathcal{P}}({\mathsf{X}})$ denotes the transition probability law of the next state given the previous state-action pair and the empirical distribution of states, and the stochastic kernel $r:{\mathsf{X}}\times{\mathcal{P}}({\mathsf{X}})\to{\mathcal{P}}({\mathsf{Y}})$ denotes the transition probability law of the current observation given the current state and the empirical distribution of states. The measurable function $c:{\mathsf{X}}\times{\mathsf{A}}\times{\mathcal{P}}({\mathsf{X}})\rightarrow[0,\infty)$ is the one-stage cost function.

Define the history spaces ${\mathsf{H}}_{0}={\mathsf{Y}}$ and ${\mathsf{H}}_{t}=({\mathsf{Y}}\times{\mathsf{A}})^{t}\times{\mathsf{Y}}$ for $t=1,2,\ldots$ , all endowed with product Borel $\sigma$ -algebras. A policy for a generic agent is a sequence $\pi=\{\pi_{t}\}$ of stochastic kernels on ${\mathsf{A}}$ given ${\mathsf{H}}_{t}$ . The set of all policies for Agent $i$ is denoted by $\Pi_{i}$ . Let $\tilde{\Pi}_{i}$ be the set of policies in $\Pi_{i}$ which only use the observations; that is, $\pi\in\tilde{\Pi}_{i}$ if $\pi_{t}:\prod_{k=0}^{t}{\mathsf{Y}}\rightarrow{\mathcal{P}}({\mathsf{A}})$ for each $t\geq 0$ . Let ${\bf\Pi}^{(N)}=\prod_{i=1}^{N}\Pi_{i}$ and $\tilde{{\bf\Pi}}^{(N)}=\prod_{i=1}^{N}\tilde{\Pi}_{i}$ . We let ${\boldsymbol{\pi}}^{(N)}\coloneqq(\pi^{1},\ldots,\pi^{N})$ , $\pi^{i}\in\Pi_{i}$ denote the $N$ -tuple of policies for all the agents in the game. Under such $N$ -tuple of policies, the actions of agents at each time $t\geq 0$ is obtained randomly according to the conditional probability law

[TABLE]

where $h^{N}_{i}(t)=(y^{N}_{i}(t),a^{N}_{i}(t-1),y^{N}_{i}(t-1)\ldots,a^{N}_{i}(0),y^{N}_{i}(0))$ is the observation-action history observed by Agent $i$ up to time $t$ . Note that agents can only use their local observations. Hence, it is a partially observed game model.

For Agent $i$ , the infinite-horizon discounted cost under the initial distribution $\mu_{0}$ and $N$ -tuple of policies ${\boldsymbol{\pi}}^{(N)}\in{\bf\Pi}^{(N)}$ is given by

[TABLE]

where $E^{{\boldsymbol{\pi}}^{(N)}}\big{[}\cdot\big{]}$ denotes expectation with respect to the unique probability law induced by the above stochastic update rules and the initial state distribution $\mu_{0}$ on the infinite product of state, observation, and action spaces of all agents.

Definition 2.1

A policy ${\boldsymbol{\pi}}^{(N*)}=(\pi^{1*},\ldots,\pi^{N*})$ constitutes a Nash equilibrium if

[TABLE]

for each $i=1,\ldots,N$ , where ${\boldsymbol{\pi}}^{(N*)}_{-i}\coloneqq(\pi^{j*})_{j\neq i}$ .

Establishing the existence of Nash equilibria for the class of games formulated above is known to be difficult due to (almost) decentralized and noisy nature of the information structure of the problem. Indeed, even if the number of players is small, it is all but impossible to show even the existence of approximate Nash equilibria for these types of games. However, when the number of players is sufficiently large, one way to overcome this challenge is to introduce the infinite-population limit $N\rightarrow\infty$ of the game described here. In this limiting case, we can model the empirical distributions of the state configurations as an exogenous state-measure flow, which should be consistent with the distribution of a generic agent (i.e., the NCE principle) by the law of large numbers. Hence, in the limiting case, a generic agent is faced with a mean-field game that will be introduced in the next section. Then one would expect that if each agent in the finite-agent $N$ game problem adopts the equilibrium policy in the infinite-population limit, the resulting policy will be an approximate Nash equilibrium for all sufficiently large $N$ . Therefore, by studying the infinite-population limit, which is easier to handle, one can obtain approximate Nash equilibrium for the original finite-agent game problem for which establishing the existence of a true Nash equilibrium is very difficult.

To that end, we slightly change the definition of Nash equilibrium in this model and adopt the following solution concept:

Definition 2.2

A policy ${\boldsymbol{\pi}}^{(N*)}\in\tilde{{\bf\Pi}}^{(N)}$ is a Nash equilibrium if

[TABLE]

for each $i=1,\ldots,N$ , and an $\varepsilon$ -Nash equilibrium (for a given $\varepsilon>0$ ) if

[TABLE]

for each $i=1,\ldots,N$ .

Note that, according to this definition, the agents can only use their local observations $(y^{N}_{i}(t),\ldots,y^{N}_{i}(0))$ to design their policies. Indeed, in practical applications, agents typically have access only to their local observations. Collecting all the observations in the entire system is intractable, particularly if the number of agents is large. Consequently, establishing that a mean-field policy is an approximate Nash equilibrium for the game under the assumption that the agents have access to full observation variables is not necessary in order to cover practically meaningful scenarios. It is sufficient to establish the existence of an approximate Nash equilibrium for the game with a local information structure. In addition, in the discrete-time mean field literature, it is common to establish the existence of approximate Nash equilibria with local (decentralized) information structures (see Adlakha et al. [1] Biswas [7]). This is true for continuous-time partially observed case as well (see N.Şen and Caines [34]).

In Section 5, we will show that the policy in the infinite-population equilibrium is an $\varepsilon$ -Nash equilibrium (for a given $\varepsilon>0$ ) when the number of agents is sufficiently large.

3 Partially observed mean-field games and mean-field equilibria

In this section we introduce a mean-field game that can be interpreted as the infinite-population limit $N\rightarrow\infty$ of the game introduced in the preceding section. This mean-field game model is specified by

[TABLE]

where, as before, ${\mathsf{X}}$ , ${\mathsf{Y}}$ , and ${\mathsf{A}}$ are the Polish state, observation, and action spaces, respectively. The stochastic kernel $p:{\mathsf{X}}\times{\mathsf{A}}\times{\mathcal{P}}({\mathsf{X}})\to{\mathcal{P}}({\mathsf{X}})$ denotes the transition probability and $r:{\mathsf{X}}\times{\mathcal{P}}({\mathsf{X}})\to{\mathcal{P}}({\mathsf{Y}})$ denotes the observation kernel. The measurable function $c:{\mathsf{X}}\times{\mathsf{A}}\times{\mathcal{P}}({\mathsf{X}})\rightarrow[0,\infty)$ is the one-stage cost function and $\mu_{0}$ is the distribution of the initial state.

Define the history spaces ${\mathsf{G}}_{0}={\mathsf{Y}}$ and ${\mathsf{G}}_{t}=({\mathsf{Y}}\times{\mathsf{A}})^{t}\times{\mathsf{Y}}$ for $t=1,2,\ldots$ , all endowed with product Borel $\sigma$ -algebras. A policy is a sequence $\pi=\{\pi_{t}\}$ of stochastic kernels on ${\mathsf{A}}$ given ${\mathsf{G}}_{t}$ . The set of all policies is denoted by $\Pi$ . Partially observed mean-field games are not games in the classical sense. They are indeed single-agent partially observed stochastic control problems whose state distribution at each time step should satisfy some consistency condition. In other words, we have a single agent with partial observations and model the overall behavior of (a large population of) other agents by an exogenous state-measure flow ${\boldsymbol{\mu}}:=(\mu_{t})_{t\geq 0}\subset{\mathcal{P}}({\mathsf{X}})$ with a given initial condition $\mu_{0}$ . This measure flow ${\boldsymbol{\mu}}$ should also be consistent with the state distributions of this single agent when the agent acts optimally. The precise mathematical description of the problem is given as follows.

We let ${\mathcal{M}}\coloneqq\bigl{\{}{\boldsymbol{\mu}}\in{\mathcal{P}}({\mathsf{X}})^{\infty}:\mu_{0}\text{ is fixed}\bigr{\}}$ be the set of all state-measure flows with a given initial condition $\mu_{0}$ . Given any measure flow ${\boldsymbol{\mu}}\in{\mathcal{M}}$ , the probabilistic evolution of the states, observations, and actions is as follows

[TABLE]

where $g(t)\in{\mathsf{G}}_{t}$ is the observation-action history up to time $t$ . According to the Ionescu Tulcea theorem (see, e.g., Hernández-Lerma and Lasserre [22]), an initial distribution $\mu_{0}$ on ${\mathsf{X}}$ , a policy $\pi$ , and a state-measure flow ${\boldsymbol{\mu}}$ define a unique probability measure $P^{\pi}$ on $({\mathsf{X}}\times{\mathsf{Y}}\times{\mathsf{A}})^{\infty}$ . The expectation with respect to $P^{\pi}$ is denoted by $E^{\pi}$ . A policy $\pi^{*}\in\Pi$ is said to be optimal for ${\boldsymbol{\mu}}$ if

[TABLE]

where the infinite-horizon discounted cost of policy $\pi$ with measure flow ${\boldsymbol{\mu}}$ and the discount factor $\beta\in(0,1)$ is given by

[TABLE]

Using these definitions, we first define the set-valued mapping

[TABLE]

as $\Psi({\boldsymbol{\mu}})=\{\pi\in\Pi:\pi\text{ is optimal for }{\boldsymbol{\mu}}\}$ . Conversely, we define a single-valued mapping

[TABLE]

as follows: given $\pi\in\Pi$ , the state-measure flow ${\boldsymbol{\mu}}:=\Lambda(\pi)$ is constructed recursively as

[TABLE]

where $P^{\pi}(da(t)|x(t))$ denotes the conditional distribution of $a(t)$ given $x(t)$ under $\pi$ and $(\mu_{\tau})_{0\leq\tau\leq t}$ . Using $\Psi$ and $\Lambda$ , we now introduce the notion of an equilibrium for the mean-field game.

Definition 3.1

A pair $(\pi,{\boldsymbol{\mu}})\in\Pi\times{\mathcal{M}}$ is a mean-field equilibrium if $\pi\in\Psi({\boldsymbol{\mu}})$ and ${\boldsymbol{\mu}}=\Lambda(\pi)$ . In other words, the policy $\pi$ is optimal for the state-measure flow ${\boldsymbol{\mu}}$ and ${\boldsymbol{\mu}}$ is consistent with the state distributions of the agent when it acts optimally via $\pi$ .

In this section, the main goal is to establish the existence of a mean-field equilibrium. To that end, we impose the assumptions below on the components of the mean-field game model.

{assumption}

(a)

The cost function $c$ is bounded and continuous.

(b)

The stochastic kernel $p$ is weakly continuous in $(x,a,\mu)$ ; i.e., for all $x$ , $a$ and $\mu$ , $p(\,\cdot\,|x_{k},a_{k},\mu_{k})\rightarrow p(\,\cdot\,|x,a,\mu)$ weakly when $(x_{k},a_{k},\mu_{k})\rightarrow(x,a,\mu)$ .

(c)

The observation kernel $r$ is continuous in $(x,\mu)$ with respect to total variation norm; i.e., for all $x$ and $\mu$ , $r(\,\cdot\,|x_{k},\mu_{k})\rightarrow r(\,\cdot\,|x,\mu)$ in total variation norm when $(x_{k},\mu_{k})\rightarrow(x,\mu)$ .

(d)

${\mathsf{A}}$ is compact and ${\mathsf{X}}$ is locally compact.

(e)

There exist a constant $\alpha\geq 0$ and a continuous moment function $w:{\mathsf{X}}\rightarrow[0,\infty)$ (see Hernández-Lerma and Lasserre [22, Definition E.7]) such that

[TABLE]

(f)

The initial probability measure $\mu_{0}$ satisfies

[TABLE]

Remark 3.2

We need Assumption 3-(c) in order to establish the continuity of the transition probability (i.e, $\eta^{{\boldsymbol{\nu}}}_{t}(\,\cdot\,|z,a)$ ), that will be introduced in the next section, of the fully-observed reduction of the partially observed mean-field game model with respect to the weak topology. If this continuity condition holds under any other assumption on the observation kernel $r$ (for instance, under fully observed case; that is, $r(\,\cdot\,|x,\mu)=\delta_{x,\mu}(\,\cdot\,)$ ), then all the results in this paper are still true.\Halmos

The main result of this section is the existence of a mean-field equilibrium under Assumption 3. Later we will show that this mean-field equilibrium constitutes an approximate Nash-equilibrium for games with sufficiently many agents.

Theorem 3.3

Under Assumption 1, the mean-field game $\bigl{(}{\mathsf{X}},{\mathsf{A}},{\mathsf{Y}},p,r,c,\mu_{0}\bigr{)}$ admits a mean-field equilibrium $(\pi,{\boldsymbol{\mu}})$ .

The proof of Theorem 3.3 is given Section 4. To establish the existence of an equilibrium, we use fully observed reduction of partially observed optimal control problems and the dynamic programming principle in addition to the fixed point approach that is commonly used in classical game problems.

4 Proof of Theorem 3.3

Note that, given any measure flow ${\boldsymbol{\mu}}\in{\mathcal{M}}$ , the optimal control problem for the mean-field game reduces to one of finding an optimal policy for a partially observed Markov decision process (POMDP). Hence, before starting the proof of Theorem 3.3, we first review a few relevant results on POMDPs. To this end, fix any ${\boldsymbol{\mu}}\in{\mathcal{M}}$ and consider the corresponding optimal control problem.

Let ${\mathcal{P}}_{w}({\mathsf{X}})\coloneqq\bigl{\{}\mu\in{\mathcal{P}}({\mathsf{X}}):\mu(w)<\infty\bigr{\}}$ . It is known that any POMDP can be reduced to a (completely observable) MDP (see Yushkevich [43], Rhenius [37]), whose states are the posterior state distributions or beliefs of the observer; that is, the state at time $t$ is

[TABLE]

We call this equivalent MDP the belief-state MDP. Note that since ${\cal L}(x(t))\in{\mathcal{P}}_{w}({\mathsf{X}})$ under any policy by Assumption 1-(e),(f), we have ${\mathsf{Pr}}\{x(t)\in\,\cdot\,|y(0),\ldots,y(t),a(0),\ldots,a(t-1)\}\in{\mathcal{P}}_{w}({\mathsf{X}})$ almost everywhere. Therefore, the belief-state MDP has state space ${\mathsf{Z}}={\mathcal{P}}_{w}({\mathsf{X}})$ and action space ${\mathsf{A}}$ , where ${\mathsf{Z}}$ is equipped with the Borel $\sigma$ -algebra generated by the topology of weak convergence. The transition probabilities $\{\eta_{t}\}_{t\geq 0}$ of the belief-state MDP can be constructed as follows (see also Hernández-Lerma [21]). Let $z$ denote the generic state variable for the belief-state MDP. Fix any $t\geq 0$ . First consider the transition probability on ${\mathsf{X}}\times{\mathsf{Y}}$ given ${\mathsf{Z}}\times{\mathsf{A}}$

[TABLE]

where $\kappa_{t}(dx,dy|x^{\prime},a)\coloneqq r(dy|x,\mu_{t+1})\otimes p(dx|x^{\prime},a,\mu_{t})$ . Let us disintegrate $R_{t}$ as follows

[TABLE]

Then, we define the mapping $F_{t}:{\mathsf{Z}}\times{\mathsf{A}}\times{\mathsf{Y}}\rightarrow{\mathsf{Z}}$ as

[TABLE]

Note that

[TABLE]

Then, $\eta_{t}:{\mathsf{Z}}\times{\mathsf{A}}\rightarrow{\mathcal{P}}({\mathsf{Z}})$ can be written as

[TABLE]

The initial point for the belief-state MDP is $\mu_{0}$ ; that is, ${\cal L}(z(0))\sim\delta_{\mu_{0}}$ . Finally, for each $t\geq 0$ , the one-stage cost function $C_{t}$ of the belief-state MDP is given by

[TABLE]

Hence, the belief-state MDP is a Markov decision process with the components

[TABLE]

For the belief-state MDP define the history spaces ${\mathsf{K}}_{0}={\mathsf{Z}}$ and ${\mathsf{K}}_{t}=({\mathsf{Z}}\times{\mathsf{A}})^{t}\times{\mathsf{Z}}$ , $t=1,2,\ldots$ . A policy is a sequence $\varphi=\{\varphi_{t}\}$ of stochastic kernels on ${\mathsf{A}}$ given ${\mathsf{K}}_{t}$ . The set of all policies is denoted by $\Phi$ . A Markov policy is a sequence $\varphi=\{\varphi_{t}\}$ of stochastic kernels on ${\mathsf{A}}$ given ${\mathsf{Z}}$ . The set of Markov policies is denoted by ${\mathsf{M}}$ . Let ${\tilde{J}}(\varphi,\mu_{0})$ denote the discounted cost function of policy $\varphi\in\Phi$ for initial point $\mu_{0}$ of the belief-state MDP. Notice that any history vector $s(t)=(z(0),\ldots,z(t),a(0),\ldots,a(t-1))$ of the belief-state MDP is a function of the history vector $g(t)=(y(0),\ldots,y(t),a(0),\ldots,a(t-1))$ of the POMDP. Let us write this relation as $i(g(t))=s(t)$ . Hence, for a policy $\varphi=\{\varphi_{t}\}\in\Phi$ , we can define a policy $\pi^{\varphi}=\{\pi_{t}^{\varphi}\}\in\Pi$ as

[TABLE]

Let us write this as a mapping from $\Phi$ to $\Pi$ : $\Phi\ni\varphi\mapsto i(\varphi)=\pi^{\varphi}\in\Pi$ . It is straightforward to show that the cost functions ${\tilde{J}}(\varphi,\mu_{0})$ and $J(\pi^{\varphi},\mu_{0})$ are the same. One can also prove that (see Yushkevich [43], Rhenius [37])

[TABLE]

and furthermore, that if $\varphi$ is an optimal policy for belief-state MDP, then $\pi^{\varphi}$ is optimal for the POMDP as well. Therefore, the optimal control problem for the mean-field game is equivalent to the optimal control of belief-state MDP.

Now, we derive the conditions satisfied by the components of the belief-state MDP under Assumption 1. Note first that ${\mathsf{Z}}=\bigcup_{n\geq 1}K_{n}$ where $K_{n}\coloneqq\{\mu\in{\mathcal{P}}_{w}({\mathsf{X}}):\mu(w)\leq n\}$ . Since $w$ is a moment function, each $K_{n}$ is tight (Hernández-Lerma and Lasserre [22, Proposition E.8]). Moreover, each $K_{n}$ is also closed since $w$ is continuous. Therefore, each $K_{n}$ is compact with respect to the weak topology. This implies that ${\mathsf{Z}}$ is a $\sigma$ -compact Polish space. Define $W:{\mathsf{Z}}\rightarrow\mathbb{R}$ as

[TABLE]

Note that $W$ is a moment function on ${\mathsf{Z}}$ . Indeed, we have

[TABLE]

We also have

[TABLE]

Moreover, by Feinberg et al. [18, Theorem 3.6], $\eta_{t}$ is weakly continuous in $(z,a)$ for all $t\geq 0$ . Therefore, the belief-state MDP satisfies the following conditions under Assumption 1.

(i)

The cost functions $\{C_{t}\}$ are bounded and continuous.

(ii)

The stochastic kernels $\{\eta_{t}\}$ are weakly continuous.

(iii)

${\mathsf{A}}$ is compact and ${\mathsf{Z}}$ is $\sigma$ -compact.

(iv)

There exist a constant $\alpha\geq 0$ and a lower semi-continuous moment function $W:{\mathsf{Z}}\rightarrow[0,\infty)$ such that

[TABLE]

(v)

The initial probability measure $\delta_{\mu_{0}}$ satisfies

[TABLE]

Our approach to prove Theorem 3.3 can be summarized as follows: (i) first, we lift the partially observed stochastic control problem, that a generic agent is faced with, to a fully observed stochastic control problem (i.e., belief-state MDP) using the above mentioned results; (ii) we prove that the state transition probabilities of the belief-state MDP are continuous with respect to state measure flow of the original partially observed problem, (iii) finally, we use the technique in our paper Saldi et al. [38], which is developed to show the existence of mean-field equilibria for fully observed mean-field games, to finish the proof. The key step in our approach is (ii) in which we mimic the elegant proof technique that is established by Feinberg et al. [18] to prove the weak continuity of the transition probabilities of fully observed reduction of partially observed stochastic control problems.

We are now ready to prove Theorem 3.3. Define the mapping ${\mathsf{B}}:{\mathcal{P}}({\mathsf{Z}})\rightarrow{\mathcal{P}}({\mathsf{X}})$ as follows:

[TABLE]

In other words, ${\mathsf{B}}(\nu)$ is the so-called ‘barycenter’ of $\nu$ , see, e.g., Phelps [36]. Using this definition, for any ${\boldsymbol{\nu}}\in{\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ , we define the measure flow ${\boldsymbol{\mu}}^{{\boldsymbol{\nu}}}\in{\mathcal{P}}({\mathsf{X}})^{\infty}$ as follows:

[TABLE]

where for any $\nu\in{\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})$ , we let $\nu_{1}$ denote the marginal of $\nu$ on ${\mathsf{Z}}$ ; that is, $\nu_{1}(\,\cdot\,)\coloneqq\nu(\,\cdot\,\times{\mathsf{A}})$ . Let $\{\eta_{t}^{{\boldsymbol{\nu}}}\}_{t\geq 0}$ and $\{C_{t}^{{\boldsymbol{\nu}}}\}_{t\geq 0}$ be, respectively, the transition probabilities and one-stage cost functions of belief-state MDP induced by the measure flow ${\boldsymbol{\mu}}^{{\boldsymbol{\nu}}}$ . We let $J_{*,t}^{{\boldsymbol{\nu}}}:{\mathsf{Z}}\rightarrow[0,\infty)$ denote the discounted cost value function at time $t$ of this belief-state MDP; that is,

[TABLE]

Let $J_{*}^{{\boldsymbol{\nu}}}\coloneqq\bigl{(}J^{{\boldsymbol{\nu}}}_{*,t}\bigr{)}_{t\geq 0}$ . To prove the existence of a mean-field equilibrium, we use the technique in our previous paper Saldi et al. [38] adopted from Jovanovic and Rosenthal [29], which enables us to transform the fixed point equation $\pi\in\Psi(\Lambda(\pi))$ characterizing the mean-field equilibrium into a fixed point equation of a set-valued mapping from the set of state-action measure flows ${\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ into itself. Then, using Kakutani’s fixed point theorem (Aliprantis and Border [2, Corollary 17.55]), we deduce the existence of a mean-field equilibrium.

Remark 4.1

We note that the technique used here to prove the existence of a mean-field equilibrium is very similar to the one in our previous paper Saldi et al. [38], in which we have studied fully-observed version of the same problem. However, there is a crucial difference in the details between this problem and the fully-observed one. In the fully-observed case, the transition probability is given by $p(\,\cdot\,|x,a,\mu_{t})$ from which we can immediately deduce the continuity of the transition probability with respect to state-measure flow. However, in the partially-observed case, we do not have such an explicit analytical expression describing the relation between $\eta_{t}^{{\boldsymbol{\nu}}}$ and ${\boldsymbol{\mu}}^{{\boldsymbol{\nu}}}$ from which we can deduce the same continuity result. Hence, we need to prove this highly non-trivial statement (unlike the fully-observed case) in order to use the technique in Saldi et al. [38]. Indeed, this is the key step here (step (ii) above) to prove the existence of a mean-field equilibrium. After we prove this result, the rest of the proof follows the same steps as in Saldi et al. [38]. However, for the sake of completeness, we give the full details of the proof, since we are using quite different notation as a result of fully-observed reduction of the original problem and the dependence of the transition probability of the belief-state MDP on the state-measure flow is not explicit as in the fully-observed case.\Halmos

As we mentioned above, we first transform the fixed point equation $\pi\in\Psi(\Lambda(\pi))$ into a fixed point equation of a set-valued mapping from ${\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ into itself. To that end, we define the product space ${\mathcal{C}}\coloneqq C_{\beta}({\mathsf{Z}})^{\infty}$ in which $J_{*}^{{\boldsymbol{\nu}}}$ is an element. Here, $C_{\beta}({\mathsf{Z}})\coloneqq\{u\in C_{b}({\mathsf{Z}}):\|u\|\leq\frac{\|c\|}{1-\beta}\}$ . Moreover, we equip ${\mathcal{C}}$ with the following metric:

[TABLE]

where $\sigma>0$ is chosen so that $\sigma\beta<1$ . For any $t\geq 0$ , we define the *Bellman optimality operator * $T_{t}^{{\boldsymbol{\nu}}}:C_{\beta}({\mathsf{Z}})\rightarrow C_{\beta}({\mathsf{Z}})$ as

[TABLE]

In the MDP theory, $T_{t}^{{\boldsymbol{\nu}}}$ gives the relation between value functions $J^{{\boldsymbol{\nu}}}_{*,t}$ and $J^{{\boldsymbol{\nu}}}_{*,t+1}$ . Moreover, given $J^{{\boldsymbol{\nu}}}_{*,t+1}$ , it characterizes the optimal policy at time $t$ . It is standard to prove that $T_{t}^{{\boldsymbol{\nu}}}$ is a contraction on $C_{\beta}({\mathsf{Z}})$ with modulus $\beta$ for all $t\geq 0$ , i.e., $\|T_{t}^{{\boldsymbol{\nu}}}(u)-T_{t}^{{\boldsymbol{\nu}}}(v)\|\leq\beta\|u-v\|$ for all $u,v\in C_{\beta}({\mathsf{Z}})$ . Let us define the operator $T^{{\boldsymbol{\nu}}}:{\mathcal{C}}\rightarrow{\mathcal{C}}$ as

[TABLE]

It can be shown that $T^{{\boldsymbol{\nu}}}$ is a contraction on ${\mathcal{C}}$ with modulus $\sigma\beta$ :

[TABLE]

Since $({\mathcal{C}},\rho)$ is a complete metric space, $T^{{\boldsymbol{\nu}}}$ has a unique fixed point by the Banach fixed point theorem.

The following theorem is a known result in the theory of nonhomogeneous Markov decision processes (see Hinderer [23, Theorems 14.4 and 17.1]). For any given ${\boldsymbol{\nu}}$ , it characterizes $J^{{\boldsymbol{\nu}}}_{*}$ and the optimal policy of the belief-state MDP.

Theorem 4.2

For any ${\boldsymbol{\nu}}$ , the collection of value functions $J^{{\boldsymbol{\nu}}}_{*}$ is the unique fixed point of the operator $T^{{\boldsymbol{\nu}}}$ . Furthermore, $\varphi\in{\mathsf{M}}$ is optimal if and only if

[TABLE]

where $\nu_{t}^{\varphi}={\cal L}\bigl{(}z(t),a(t)\bigr{)}$ under $\varphi$ and ${\boldsymbol{\nu}}$ .

We are now ready to define the above-mentioned set-valued map from ${\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ into itself. To that end, for any ${\boldsymbol{\nu}}\in{\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ , let us define the following sets:

[TABLE]

Note that the set $C({\boldsymbol{\nu}})$ characterizes the consistency of the mean-field term with the distribution of a generic agent, and the set $B({\boldsymbol{\nu}})$ characterizes optimality of the policy that is obtained by disintegrating the state-action measure-flow, for the mean-field term. The set-valued mapping $\Gamma:{\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}\rightarrow 2^{{\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}}$ is given as follows:

[TABLE]

We say that ${\boldsymbol{\nu}}$ is a fixed point of $\Gamma$ if ${\boldsymbol{\nu}}\in\Gamma({\boldsymbol{\nu}})$ . The following proposition makes the connection between mean-field equilibria and the fixed points of $\Gamma$ , and so, transforms the fixed point equation $\pi\in\Psi(\Lambda(\pi))$ into the fixed point equation ${\boldsymbol{\nu}}\in\Gamma({\boldsymbol{\nu}})$ .

Proposition 4.3

Suppose that $\Gamma$ has a fixed point ${\boldsymbol{\nu}}=(\nu_{t})_{t\geq 0}$ . Construct a Markov policy $\varphi=(\varphi_{t})_{t\geq 0}$ for belief-state MDP by disintegrating each $\nu_{t}$ as $\nu_{t}(dx,da)=\nu_{t,1}(dx)\varphi_{t}(da|x)$ , and let ${\boldsymbol{\mu}}=({\mathsf{B}}(\nu_{t,1}))_{t\geq 0}$ . Then the pair $(\pi^{\varphi},{\boldsymbol{\mu}})$ is a mean-field equilibrium.

Proof 4.4

Proof. Note that, since ${\boldsymbol{\nu}}\in C({\boldsymbol{\nu}})$ , we have $\nu_{t}={\cal L}\bigl{(}z(t),a(t)\bigr{)}$ ( $t\geq 0$ ) for belief-state MDP under the policy $\varphi$ and the measure flow ${\boldsymbol{\mu}}$ . Then, for any $f\in C_{b}({\mathsf{X}})$ , we have

[TABLE]

Since (8) is true for all $f\in C_{b}({\mathsf{X}})$ , we have

[TABLE]

where $P^{\pi^{\varphi}}(da(t)|x(t))$ denotes the conditional distribution of $a(t)$ given $x(t)$ under $\pi^{\varphi}$ and $(\mu_{\tau})_{0\leq\tau\leq t}$ . Hence, $\Lambda(\pi^{\varphi})={\boldsymbol{\mu}}$ .

Since ${\boldsymbol{\nu}}\in B({\boldsymbol{\nu}})$ , the corresponding Markov policy $\varphi$ satisfies (7) for ${\boldsymbol{\nu}}$ . Therefore, by Theorem 4.2 and the fact that $\nu_{t}={\cal L}\bigl{(}z(t),a(t)\bigr{)}$ ( $t\geq 0$ ) for belief-state MDP under the policy $\varphi$ and the measure flow ${\boldsymbol{\mu}}$ , $\varphi$ is optimal for belief-state MDP induced by the measure flow ${\boldsymbol{\mu}}$ (or, equivalently, ${\boldsymbol{\nu}}$ ). Therefore, $\pi^{\varphi}\in\Psi({\boldsymbol{\mu}})$ .\Halmos

By Proposition 4.3, it suffices to prove that $\Gamma$ has a fixed point in order to establish the existence of a mean-field equilibrium. To prove this, Kakutani’s fixed point theorem (see, e.g., Aliprantis and Border [2, Corollary 17.55]), which is a standard result to prove the existence of Nash equilibrium in classical game problems, is the right tool to use. In order to use Kakutani’s fixed point theorem, the space on which the set-valued mapping is defined should be convex and compact. However, the set ${\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ in the definition of $\Gamma$ is not compact. But, we will prove that the image of ${\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ under $\Gamma$ is contained in a convex and compact subset of ${\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ . Hence, it is sufficient to consider this convex and compact set in the definition of $\Gamma$ . To that end, for each $t\geq 0$ , define the set

[TABLE]

Since $W$ is a lower semi-continuous moment function, the set ${\mathcal{P}}^{t}({\mathsf{Z}})$ is compact with respect to the weak topology, see Hernández-Lerma and Lasserre [22, Proposition E.8, p. 187]. Let us define

[TABLE]

Since ${\mathsf{A}}$ is compact, ${\mathcal{P}}^{t}({\mathsf{Z}}\times{\mathsf{A}})$ is tight. Furthermore, ${\mathcal{P}}^{t}({\mathsf{Z}}\times{\mathsf{A}})$ is closed with respect to the weak topology. Hence, ${\mathcal{P}}^{t}({\mathsf{Z}}\times{\mathsf{A}})$ is compact. Let $\Xi\coloneqq\prod_{t=0}^{\infty}{\mathcal{P}}^{t}({\mathsf{Z}}\times{\mathsf{A}})$ , which is convex and compact with respect to the product topology.

Proposition 4.5

We have $\Gamma\bigl{(}{\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}\bigr{)}\coloneqq\bigl{\{}{\boldsymbol{\nu}}^{\prime}:{\boldsymbol{\nu}}^{\prime}\in\Gamma({\boldsymbol{\nu}}),\text{ }{\boldsymbol{\nu}}\in{\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}\bigr{\}}\subset\Xi$ .

Proof 4.6

Proof. Fix any ${\boldsymbol{\nu}}\in{\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ . It is sufficient to prove that $C({\boldsymbol{\nu}})\subset\Xi$ . Let ${\boldsymbol{\nu}}^{\prime}\in C({\boldsymbol{\nu}})$ . We prove by induction that $\nu^{\prime}_{t,1}\in{\mathcal{P}}^{t}_{v}({\mathsf{Z}})$ for all $t\geq 0$ . The claim trivially holds for $t=0$ as $\nu^{\prime}_{0,1}=\delta_{\mu_{0}}$ . Assume the claim holds for $t$ and consider $t+1$ . We have

[TABLE]

Hence, $\nu^{\prime}_{t+1,1}\in{\mathcal{P}}^{t+1}_{v}({\mathsf{Z}})$ . \Halmos

By Proposition 4.5, it is sufficient to prove that $\Gamma$ has a fixed point ${\boldsymbol{\nu}}\in\Xi$ . As in Jovanovic and Rosenthal [29, Theorem 1], one can prove that $C({\boldsymbol{\nu}})\cap B({\boldsymbol{\nu}})\neq\emptyset$ for any ${\boldsymbol{\nu}}\in\Xi$ . Moreover, note that both $C({\boldsymbol{\nu}})$ and $B({\boldsymbol{\nu}})$ are convex, and thus their intersection is also convex. $\Xi$ is a convex compact subset of a locally convex topological space ${\mathcal{M}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ , where ${\mathcal{M}}({\mathsf{Z}}\times{\mathsf{A}})$ denotes the set of all finite signed measures on ${\mathsf{Z}}\times{\mathsf{A}}$ . Hence, in order to deduce the existence of a fixed point of $\Gamma$ , we need to prove that it has a closed graph. Before stating this result, we prove the following proposition which is a key element of the proof as mentioned earlier (i.e., step (ii)). Its proof is given in Appendix A.

Proposition 4.7

Let ${\boldsymbol{\nu}}^{(n)}\rightarrow{\boldsymbol{\nu}}$ in product topology. Then, for all $t\geq 0$ , $\eta_{t}^{{\boldsymbol{\nu}}^{(n)}}(\,\cdot\,|z,a)$ weakly converges to $\eta_{t}^{{\boldsymbol{\nu}}}(\,\cdot\,|z,a)$ for all $(z,a)\in{\mathsf{Z}}\times{\mathsf{A}}$ .

Using Proposition 4.7, we can now prove the following proposition.

Proposition 4.8

The graph of $\Gamma$ , i.e., the set

[TABLE]

is closed.

Proof 4.9

Proof. We note that modulo Proposition 4.7, the proof of the proposition is almost the same as the proof in Saldi et al. [38, Proposition 3.9] for the full state measurement case. However, for the sake of completeness, we give the proof here.

Let $\bigl{\{}({\boldsymbol{\nu}}^{(n)},{\boldsymbol{\xi}}^{(n)})\bigr{\}}_{n\geq 1}\subset\Xi\times\Xi$ be such that ${\boldsymbol{\xi}}^{(n)}\in\Gamma({\boldsymbol{\nu}}^{(n)})$ for all $n$ and $({\boldsymbol{\nu}}^{(n)},{\boldsymbol{\xi}}^{(n)})\rightarrow({\boldsymbol{\nu}},{\boldsymbol{\xi}})$ as $n\rightarrow\infty$ for some $({\boldsymbol{\nu}},{\boldsymbol{\xi}})\in\Xi\times\Xi$ . To prove $\mathop{\rm Gr}(\Gamma)$ is closed, it is sufficient to prove that ${\boldsymbol{\xi}}\in\Gamma({\boldsymbol{\nu}})$ .

Using Proposition 4.7, we first prove that ${\boldsymbol{\xi}}\in C({\boldsymbol{\nu}})$ ; that is, for all $t\geq 0$ , we have

[TABLE]

For all $n$ and $t$ , we have

[TABLE]

Since ${\boldsymbol{\xi}}^{(n)}\rightarrow{\boldsymbol{\xi}}$ in $\Xi$ , $\xi^{(n+1)}_{t+1}\rightarrow\xi_{t+1}$ weakly. Let $g\in C_{b}({\mathsf{Z}})$ . Then, by Langen [31, Theorem 3.5], we have

[TABLE]

since ${\boldsymbol{\nu}}^{(n)}_{t}\rightarrow{\boldsymbol{\nu}}_{t}$ weakly and $\int_{{\mathsf{Z}}}g(y)\eta_{t}^{{\boldsymbol{\nu}}^{(n)}}(\,\cdot\,|z,a)$ converges to $\int_{{\mathsf{Z}}}g(y)\eta_{t}^{{\boldsymbol{\nu}}}(\,\cdot\,|z,a)$ continuously111Suppose $g$ , $g_{n}$ ( $n\geq 1$ ) are measurable functions on metric space ${\mathsf{E}}$ . The sequence $g_{n}$ is said to converge to $g$ continuously if $\lim_{n\rightarrow\infty}g_{n}(e_{n})=g(e)$ for any $e_{n}\rightarrow e$ where $e\in{\mathsf{E}}$ . (see Langen [31, Theorem 3.5]). This implies that the measure in the right hand side of (9) converges weakly to $\int_{{\mathsf{Z}}\times{\mathsf{A}}}\eta_{t}^{{\boldsymbol{\nu}}}(\,\cdot\,|z,a)\nu_{t}(dz,da)$ . Therefore, we have

[TABLE]

from which we conclude that ${\boldsymbol{\xi}}\in C({\boldsymbol{\nu}})$ .

To complete the proof, it suffices to prove that ${\boldsymbol{\xi}}\in B({\boldsymbol{\nu}})$ . To that end, for each $n$ and $t$ , let us define the following functions

[TABLE]

By definition,

[TABLE]

Define also the following sets

[TABLE]

Since ${\boldsymbol{\xi}}^{(n)}\in B({\boldsymbol{\nu}}^{(n)})$ , we have

[TABLE]

To prove to ${\boldsymbol{\xi}}\in B({\boldsymbol{\nu}})$ , we need to show that

[TABLE]

First note that since both $F^{(n)}_{t}$ and $J^{{\boldsymbol{\nu}}^{(n)}}_{*,t}$ are continuous, $A_{t}^{(n)}$ is closed. Moreover, $A_{t}$ is also closed as both $F_{t}$ and $J^{{\boldsymbol{\nu}}}_{*,t}$ are continuous. One can also prove as in Saldi et al. [38, Proposition 3.10] that $F_{t}^{(n)}$ converges to $F_{t}$ continuously and $J^{{\boldsymbol{\nu}}^{(n)}}_{*,t}$ converges to $J^{{\boldsymbol{\nu}}}_{*,t}$ continuously, as $n\rightarrow\infty$ .

For each $M\geq 1$ , define the closed set $B_{t}^{M}\coloneqq\bigl{\{}(z,a):F_{t}(z,a)\geq J^{{\boldsymbol{\nu}}}_{*,t}(z)+\epsilon(M)\bigr{\}}$ , where $\epsilon(M)\rightarrow 0$ as $M\rightarrow\infty$ . Since both $F_{t}$ and $J^{{\boldsymbol{\nu}}}_{*,t}$ is continuous, we can choose $\{\epsilon(M)\}_{M\geq 1}$ so that $\xi_{t}(\partial B_{t}^{M})=0$ for each $M$ . Note that by the monotone convergence theorem, we have

[TABLE]

This implies that

[TABLE]

*For any fixed $M$ , we prove that the limit of the second term in the last expression converges to zero. To that end, we first note that $\xi^{(n)}_{t}$ converges weakly to $\xi_{t}$ as $n\rightarrow\infty$ when both measures are restricted to $B_{t}^{M}$ , as $B_{t}^{M}$ is closed and $\xi_{t}(\partial B_{t}^{M})=0$ , see, e.g., Bogachev [8, Theorem 8.2.3]. Furthermore, since $F_{t}^{(n)}$ converges to $F_{t}$ continuously and $J^{{\boldsymbol{\nu}}^{(n)}}_{*,t}$ converges to $J^{{\boldsymbol{\nu}}}_{*,t}$ continuously, $1_{A^{(n)}_{t}\cap B^{M}_{t}}$ converges continuously to [math], which implies by Langen [31, Theorem 3.5] that *

[TABLE]

Therefore, we obtain

[TABLE]

where the last inequality follows from the Portmanteau theorem (see, e.g., Billingsley [6, Theorem 2.1]) and the fact that $A_{t}$ is closed. Hence, $\xi_{t}(A_{t})=1$ . Since $t$ is arbitrary, this is true for all $t$ . This means that ${\boldsymbol{\xi}}\in B({\boldsymbol{\nu}})$ . Therefore, ${\boldsymbol{\xi}}\in\Gamma({\boldsymbol{\nu}})$ . \Halmos

Recall that $\Xi$ is a compact convex subset of the locally convex topological space ${\mathcal{M}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}$ . Furthermore, the graph of $\Gamma$ is closed by Proposition 4.8, and it takes nonempty convex values. Therefore, by Kakutani’s fixed point theorem (Aliprantis and Border [2, Corollary 17.55]), $\Gamma$ has a fixed point. Therefore, the pair $(\pi^{\varphi},{\boldsymbol{\mu}})$ is a mean field equilibrium, where $\pi^{\varphi}$ and ${\boldsymbol{\mu}}$ are constructed as in the statement of Proposition 4.3. This completes the proof of Theorem 3.3.

5 Approximation of Nash Equilibria

In this section, our aim is to show that the policy generated by the mean-field equilibrium, when adopted by each agent, is nearly Nash equilibrium for games with sufficiently large number of agents. Let $(\pi^{\prime},{\boldsymbol{\mu}})$ denote the pair in the mean-field equilibrium. In addition to Assumption 3, we impose an additional assumption, which is stated below. To this end, let $d_{BL}$ denote the bounded Lipschitz metric (see, e.g., Dudley [17, Proposition 11.3.2]**) on ${\mathcal{P}}({\mathsf{X}})$ that metrizes the weak topology, and define the following moduli of continuity:

[TABLE]

{assumption}

(a)

$\omega_{p}(r)\rightarrow 0$ * and $\omega_{c}(r)\rightarrow 0$ as $r\rightarrow 0$ .*

(b)

For each $t\geq 0$ , $\pi_{t}^{\prime}:{\mathsf{G}}_{t}\rightarrow{\mathcal{P}}({\mathsf{A}})$ is deterministic; that is, $\pi_{t}^{\prime}(\,\cdot\,|g(t))=\delta_{f_{t}(g(t))}(\,\cdot\,)$ for some measurable function $f_{t}:{\mathsf{G}}_{t}\rightarrow{\mathsf{A}}$ , and weakly continuous.

(c)

The observation kernel $r(\,\cdot\,|x)$ does not depend on the mean-field term.

Remark 5.1

Note that, if the state transition probability $p$ is independent of the mean-field term, then Assumption 5-(a) for $\omega_{p}$ is always true. Indeed, in that case, we have $\omega_{p}(r)=0$ for all $r$ .\Halmos

Remark 5.2

One way to establish Assumption 5-(b) is as follows. Suppose that, for ${\boldsymbol{\mu}}$ , there exists a unique minimizer $a_{z}\in{\mathsf{A}}$ of

[TABLE]

for each $z\in{\mathsf{Z}}$ and for all $t\geq 0$ . In addition, suppose that $F_{t}:{\mathsf{Z}}\times{\mathsf{A}}\times{\mathsf{Y}}\rightarrow{\mathsf{Z}}$ ( $t\geq 0$ ) in (3) is continuous. Note that uniqueness conditions analogous to (10) are quite common in mean field literature (see, e.g., Gomes et al. [19, Assuption 4], Şen and Caines [16, Assumption A5], Huang et al. [28, Assumption H5], N.Şen and Caines [34, Assumption A9]).

Under the condition of unique minimizer to (10), one can prove that the policy $\varphi$ in Proposition 4.3 is deterministic and weakly continuous (see Saldi et al. [38, Section 5]). Indeed, fix any $t\geq 0$ and consider the policy $\varphi_{t}$ at time $t$ in $\varphi$ . By (10), we must have $\varphi_{t}(\,\cdot\,|z)=\delta_{f_{t}(z)}(\,\cdot\,)$ for some $f_{t}:{\mathsf{Z}}\rightarrow{\mathsf{A}}$ which minimizes $R_{t}(z,\,\cdot\,)$ of the above form; that is, $\min_{a\in{\mathsf{A}}}R_{t}(z,a)=R_{t}(z,f_{t}(z))$ for all $z\in{\mathsf{Z}}$ . If $f_{t}$ is continuous, then $\varphi_{t}$ is also continuous. Hence, in order to prove the assertion, it is sufficient to prove that $f_{t}$ is continuous. Suppose $z_{n}\rightarrow z$ in ${\mathsf{Z}}$ . Note that $l_{t}(\,\cdot\,)=\min_{a\in{\mathsf{A}}}R_{t}(\,\cdot\,,a)$ is continuous. Therefore, every accumulation point of the sequence $\{f_{t}(z_{n})\}_{n\geq 1}$ must be a minimizer for $R_{t}(z,\,\cdot\,)$ . Since there exists a unique minimizer $f_{t}(z)$ of $R_{t}(z,\,\cdot\,)$ , the set of all accumulation points of $\{f_{t}(z_{n})\}_{n\geq 1}$ must be $\{f_{t}(z)\}$ . This implies that $f_{t}(z_{n})$ converges to $f_{t}(z)$ since ${\mathsf{A}}$ is compact. Hence, $f_{t}$ is continuous.

Recall that the mean-field equilibrium policy is given by

[TABLE]

Hence, $\pi$ is also a deterministic policy as $i$ is a deterministic function. The function $i$ can be given recursively by $F_{t}:{\mathsf{Z}}\times{\mathsf{A}}\times{\mathsf{Y}}\rightarrow{\mathsf{Z}}$ ( $t\geq 0$ ) in (3) and the policy $\varphi$ . Since $F_{t}$ is continuous for all $t$ and $\varphi$ is also weakly continuous, we can conclude that the mean-field policy $\pi$ is deterministic and weakly continuous. Hence, Assumption 5-(b) holds.

For instance, we can prove existence of a unique minimizer to (10) and the continuity of $F_{t}$ for all $t\geq 0$ under the following conditions on the system components. Suppose that ${\mathsf{X}}=\mathbb{R}^{d}$ , ${\mathsf{Y}}=\mathbb{R}^{p}$ , and ${\mathsf{A}}\subset\mathbb{R}^{m}$ is convex. In addition, suppose that $p(dx^{\prime}|x,a,\mu)=\varrho(x^{\prime}|x,a,\mu)m(dx^{\prime})$ and $r(dy|x)=\zeta(y|x)m(dy)$ , where $m$ denotes the Lebesgue measure. Assume that both $\varrho$ and $\zeta$ are continuous and bounded, and $\varrho$ and $c$ are strictly convex in $a$ . Then we have

[TABLE]

where $h_{t}(y|z,a)$ is given by

[TABLE]

Similarly, we have

[TABLE]

where $f_{t}(x|z,a,y)$ is given by

[TABLE]

One can prove that $f_{t}$ is continuous. Hence, $F_{t}$ is also continuous by [5, Theorem 16.2]. To show uniqueness of a minimizer to (10), note that

[TABLE]

where

[TABLE]

Hence, for any $a\in{\mathsf{A}}$ , (10) can be written as

[TABLE]

Since $c$ and $\varrho$ are strictly convex in $a$ , the last expression is also strictly convex in $a$ . Hence, there exists a unique minimizer $a_{z}\in{\mathsf{A}}$ for (10).\halmos

For $t\geq 0$ , let ${\mathsf{Y}}^{t+1}\coloneqq\prod_{k=0}^{t}{\mathsf{Y}}$ . Then, for each $t\geq 1$ , define $\tilde{f}_{t}:{\mathsf{Y}}^{t+1}\rightarrow{\mathsf{A}}$ as

[TABLE]

where $\tilde{f}_{0}=f_{0}$ . Let $\pi_{t}(\,\cdot\,|y(t),\ldots,y(0))=\delta_{\tilde{f}_{t}(y(t),\ldots,y(0))}(\,\cdot\,)$ . Note that $\pi_{t}$ is a weakly continuous stochastic kernel on ${\mathsf{A}}$ given ${\mathsf{Y}}^{t+1}$ . Indeed, $\pi$ and $\pi^{\prime}$ are equivalent because, for all $t\geq 0$ , we have

[TABLE]

Hence, $(\pi,{\boldsymbol{\mu}})$ is also a mean-field equilibrium. In the sequel, we use $(\pi,{\boldsymbol{\mu}})$ to prove the approximation result. The reason for passing from $f_{t}$ to $\tilde{f}_{t}$ is that the latter policy becomes Markov in the equivalent game model that will be introduced in the proof of Theorem 5.5. Thus, we can use the proof technique in our previous paper (Saldi et al. [38]**) to show the existence of an approximate Nash equilibrium.

Before stating and proving the main result of this section on approximate Nash equilibrium property of the mean field equilibrium for the finite population case (Theorem 5.5), we discuss (and establish under some conditions) the uniqueness of the mean-field equilibrium. This entails a monotonicity condition similar to the monotonicity condition introduced by Lasry and Lions [32]**. {assumption}

(U1)

Uniqueness condition in (10) holds for any mean-field equilibrium.

(U2)

The state transition probability $p$ and the observation kernel $r$ do not depend on the mean-field term.

(U3)

For any ${\boldsymbol{\nu}}$ and ${\boldsymbol{\mu}}$ in $\Xi$ , we have the following monotonicity condition:

[TABLE]

We note that Assumption 5 is exactly the discrete-time counterpart of the Assumption (U) introduced in Lacker [30], Carmona and Lacker [12]**. Recall that Assumption 5-(U1) is true under the strict convexity assumptions introduced in Remark 5.2.

Theorem 5.3

Under Assumption 5, there exists at most one solution of the mean-field equilibrium. Furthermore, if Assumption 3 holds, then there exists a unique mean-field equilibrium.

Proof 5.4

Proof. Suppose that $(\pi^{{\boldsymbol{\mu}}},{\boldsymbol{\mu}})$ and $(\pi^{{\boldsymbol{\nu}}},{\boldsymbol{\nu}})$ are two distinct mean-field equilibria. Note that, under Assumption 5-(U1), the policies in mean-field equilibria are deterministic; that is, $\pi^{{\boldsymbol{\mu}}}=\{f_{t}^{{\boldsymbol{\mu}}}\}$ and $\pi^{{\boldsymbol{\nu}}}=\{f_{t}^{{\boldsymbol{\nu}}}\}$ , and they are unique optimal deterministic control policies given the measure flows ${\boldsymbol{\mu}}$ and ${\boldsymbol{\nu}}$ . In addition, by Assumption 5-(U2), the transition probability $\eta$ in fully-observed reduction of the POMDP in mean-field equilibrium does not depend on the mean-field term. Hence, by Assumption 5-(U3), we have the following inequality

[TABLE]

Since $J_{{\boldsymbol{\mu}}}(\pi^{{\boldsymbol{\mu}}})=\inf_{\pi}J_{{\boldsymbol{\mu}}}(\pi)$ and $J_{{\boldsymbol{\nu}}}(\pi^{{\boldsymbol{\nu}}})=\inf_{\pi}J_{{\boldsymbol{\nu}}}(\pi)$ , we have

[TABLE]

Recall that $\pi^{{\boldsymbol{\mu}}}$ and $\pi^{{\boldsymbol{\nu}}}$ are unique optimal deterministic control policies given the measure flows ${\boldsymbol{\mu}}$ and ${\boldsymbol{\nu}}$ . Therefore, $\pi^{{\boldsymbol{\mu}}}=\pi^{{\boldsymbol{\nu}}}$ , and so, ${\boldsymbol{\nu}}={\boldsymbol{\mu}}$ since the transition probability $\eta$ in fully-observed problem does not depend on mean-field term. This completes the proof.\Halmos

The following theorem is the main result of this section, which states that the policy ${\boldsymbol{\pi}}^{(N)}=(\pi,\ldots,\pi)$ , where $\pi$ is repeated $N$ times, is an $\varepsilon$ -Nash equilibrium for sufficiently large $N$ .

Theorem 5.5

For any $\varepsilon>0$ , there exists $N(\varepsilon)$ such that for $N\geq N(\varepsilon)$ , the policy ${\boldsymbol{\pi}}^{(N)}$ is an $\varepsilon$ -Nash equilibrium for the game with $N$ agents.

Proof of Theorem 5.5

Note that the policy $\pi$ in the mean-field equilibrium is not necessarily Markovian, which makes the joint process of the state, observation, and mean-field term non-Markov. To prove Theorem 5.5, we will first construct an equivalent game model whose states are the state of the original model plus the current and past observations. In this new model, the mean-field equilibrium policy automatically becomes Markov. Then, we will use the proof technique in our previous paper Saldi et al. [38]* to show the existence of an approximate Nash equilibrium.*

This new game model is specified by

[TABLE]

where, for each $t\geq 0$ ,

[TABLE]

and ${\mathsf{A}}$ are the Polish state and action spaces at time $t$ , respectively. The stochastic kernel $P_{t}:{\mathsf{S}}_{t}\times{\mathsf{A}}\times{\mathcal{P}}({\mathsf{S}}_{t})\to{\mathcal{P}}({\mathsf{S}}_{t+1})$ is defined as:

[TABLE]

where $B_{t+1}\in{\mathcal{B}}({\mathsf{X}})$ , $D_{k}\in{\mathcal{B}}({\mathsf{Y}})$ ( $k=0,\ldots,t+1$ ), $s(t)=(x(t),y(t),y(t-1),\ldots,y(0))$ , and $\Delta_{t,1}$ is the marginal of $\Delta_{t}$ on ${\mathsf{X}}$ . Indeed, $P_{t}$ is the controlled transition probability of next state-observation pair, current observation, and past observations $\bigl{(}x(t+1),y(t+1),y(t),\ldots,y(0)\bigr{)}$ given the current state-observation pair and past observations $\bigl{(}x(t),y(t),y(t-1),\ldots,y(0)\bigr{)}$ in the original mean-field game. For each $t\geq 0$ , the one-stage cost function $C_{t}:{\mathsf{S}}_{t}\times{\mathsf{A}}\times{\mathcal{P}}({\mathsf{S}}_{t})\rightarrow[0,\infty)$ is defined as:

[TABLE]

Finally, the initial measure $\lambda_{0}$ is given by $\lambda_{0}(ds(0))\coloneqq r(dy|x)\mu_{0}(dx)$ , where $s(0)=(x,y)$ .

Suppose that Assumptions 3 and 5 hold. For each $t\geq 0$ , let $d_{t}$ denote the bounded Lipschitz metric on ${\mathcal{P}}({\mathsf{S}}_{t})$ 222The product metric on ${\mathsf{S}}_{t}$ is assumed to be the sum of the metrics of the components in the product space., and define the following moduli of continuity:

[TABLE]

Then, for each $t\geq 0$ , the following are satisfied:

(I)

The one-stage cost function $C_{t}$ is bounded and continuous.

(II)

The stochastic kernel $P_{t}$ is weakly continuous.

(III)

$\omega_{P_{t}}(r)\rightarrow 0$ * and $\omega_{C_{t}}(r)\rightarrow 0$ as $r\rightarrow 0$ .*

It is straightforward to prove that (I) and (II) hold since $c$ is continuous, $p$ is weakly continuous, and $r$ is continuous in total variation norm. For (III), for each $t\geq 0$ , fix any $(s,a)\in S_{t}\times{\mathsf{A}}$ and $(\Delta,\Delta^{\prime})\in{\mathcal{P}}(S_{t})\times{\mathcal{P}}(S_{t})$ , and for any $f:{\mathsf{S}}_{t+1}\rightarrow\mathbb{R}$ , define $\tilde{f}(\,\cdot\,,\,\cdot\,)=f(\,\cdot\,,\,\cdot\,,y(t),\ldots,y(0))$ where $s=(x(t),y(t),\ldots,y(0))$ . Then, we have

[TABLE]

Since $(s,a)\in S_{t}\times{\mathsf{A}}$ is arbitrary and $d_{BL}(\Delta_{1},\Delta^{\prime}_{1})\leq d_{t}(\Delta,\Delta^{\prime})$ , we have $\omega_{P_{t}}(r)\rightarrow 0$ as $r\rightarrow 0$ by Assumption 5-(a). Similarly, we can also prove that $\omega_{C_{t}}(r)\rightarrow 0$ as $r\rightarrow 0$ .

Recall the set of policies $\Pi$ in the original mean-field game. Let $\tilde{\Pi}$ be the set of policies in $\Pi$ which only use the observations; that is, $\pi\in\tilde{\Pi}$ if $\pi_{t}:{\mathsf{Y}}^{t+1}\rightarrow{\mathcal{P}}({\mathsf{A}})$ for each $t\geq 0$ . Note that $\tilde{\Pi}$ is a subset of the set of Markov policies in the new model. For any measure flow ${\boldsymbol{\Delta}}=(\Delta_{t})_{t\geq 0}$ , where $\Delta_{t}\in{\mathcal{P}}({\mathsf{S}}_{t})$ , we denote by $\hat{J}_{{\boldsymbol{\Delta}}}(\pi)$ the infinite-horizon discounted-cost of the policy $\pi\in\tilde{\Pi}$ in this new model.

Similar to Section 2, we also define the corresponding $N$ agent game as follows. We have the Polish state spaces $\{{\mathsf{S}}_{t}\}_{t\geq 0}$ and action space ${\mathsf{A}}$ . For every $t\in\{0,1,2,\ldots\}$ and every $i\in\{1,2,\ldots,N\}$ , let $s^{N}_{i}(t)\in{\mathsf{S}}_{t}$ and $a^{N}_{i}(t)\in{\mathsf{A}}$ denote the state and the action of Agent $i$ at time $t$ , and let

[TABLE]

denote the empirical distribution of the state configuration at time $t$ . The initial states $s^{N}_{i}(0)$ are independent and identically distributed according to $\lambda_{0}$ , and, for each $t\geq 0$ , the next-state configuration $(s^{N}_{1}(t+1),\ldots,s^{N}_{N}(t+1))$ is generated at random according to the probability laws

[TABLE]

Recall that $\tilde{\Pi}_{i}$ denotes the set of policies that only use local observations for Agent $i$ in the original game. Note that $\tilde{\Pi}_{i}$ is an admissible class of policies for the new model. Indeed, policies in $\tilde{\Pi}_{i}$ are Markov for this new model since they partly use the state information. We let $\tilde{\Pi}_{i}^{c}$ denote the set of all policies in $\tilde{\Pi}_{i}$ for Agent $i$ that are weakly continuous; that is, $\pi=\{\pi_{t}\}\in\tilde{\Pi}_{i}^{c}$ if for all $t\geq 0$ , $\pi_{t}:{\mathsf{Y}}^{t+1}\rightarrow{\mathcal{P}}({\mathsf{A}})$ is continuous when ${\mathcal{P}}({\mathsf{A}})$ is endowed with the weak topology.

For Agent $i$ , the infinite-horizon discounted cost under the initial distribution $\lambda_{0}$ and $N$ -tuple of policies ${\boldsymbol{\pi}}^{(N)}\in\tilde{{\bf\Pi}}^{(N)}$ is denoted as $\hat{J}_{i}^{(N)}({\boldsymbol{\pi}}^{(N)})$ .

The following proposition makes the connection between this new model and the original model.

Proposition 5.6

For any $N\geq 1$ , ${\boldsymbol{\pi}}^{(N)}\in\tilde{{\bf\Pi}}^{(N)}$ , and $i=1,\ldots,N$ , we have $\hat{J}_{i}({\boldsymbol{\pi}}^{(N)})=J_{i}({\boldsymbol{\pi}}^{(N)})$ . Similarly, for any $\pi\in\tilde{\Pi}$ and measure flow ${\boldsymbol{\Delta}}$ , we have $\hat{J}_{\boldsymbol{\Delta}}(\pi)=J_{{\boldsymbol{\mu}}}(\pi)$ where ${\boldsymbol{\mu}}=(\Delta_{t,1})_{t\geq 0}$ .

Proof 5.7

Proof. The proof of the proposition is given in Appendix B.\Halmos

By Proposition 5.6, in the remainder of this section we consider the new game model in place of the original one. Then, we use the same technique as in our previous paper Saldi et al. [38]* to prove the approximation result since the policy in the mean-field equilibrium is Markov for this new model. However, as the state space in this new model is expanding at each time step, there will be some differences between the current proof and the proof in Saldi et al. [38, Section 4]**. Therefore, for the sake of completeness, we give the full details of the proof.*

Define the measure flow ${\boldsymbol{\Delta}}=(\Delta_{t})_{t\geq 0}$ as follows: $\Delta_{t}={\cal L}(x(t),y(t),\ldots,y(0))$ , where ${\cal L}(x(t),y(t),\ldots,y(0))$ denotes the probability law of $(x(t),y(t),\ldots,y(0))$ in the original mean-field game under the policy $\pi$ in the mean-field equilibrium. For each $t\geq 0$ , define the stochastic kernel $P_{t}^{\pi}(\,\cdot\,|s,\Delta)$ on ${\mathsf{S}}_{t+1}$ given ${\mathsf{S}}_{t}\times{\mathcal{P}}({\mathsf{S}}_{t})$ as

[TABLE]

Since $\pi_{t}$ is assumed to be weakly continuous, $P_{t}^{\pi}(\,\cdot\,|s,\Delta)$ is also weakly continuous in $(s,\Delta)$ . In the sequel, to ease the notation, we will also write $P_{t}^{\pi}(\,\cdot\,|s,\Delta)$ as $P_{t,\Delta}^{\pi}(\,\cdot\,|s)$ .

Lemma 5.8

Measure flow ${\boldsymbol{\Delta}}$ satisfies

[TABLE]

Proof 5.9

Proof. The proof of the lemma is given in Appendix C.\Halmos

For each $N\geq 1$ , let $\bigl{\{}s_{i}^{N}(t)\bigr{\}}_{1\leq i\leq N}$ denote the state configuration at time $t$ in the $N$ -person game under the policy ${\boldsymbol{\pi}}^{(N)}=\{\pi,\pi,\ldots,\pi\}$ . Define the empirical distribution

[TABLE]

Proposition 5.10

For all $t\geq 0$ , we have

[TABLE]

weakly in ${\mathcal{P}}({\mathcal{P}}({\mathsf{S}}_{t}))$ , as $N\rightarrow\infty$ .

Proof 5.11

Proof. It is known that weak topology on ${\mathcal{P}}({\mathsf{S}}_{t})$ can be metrized using the following metric:

[TABLE]

where $\{f_{m}\}_{m\geq 1}$ is an appropriate sequence of real continuous and bounded functions on ${\mathsf{S}}_{t}$ such that $\|f_{m}\|\leq 1$ for all $m\geq 1$ (see Parthasarathy [35, Theorem 6.6, p. 47]). Define the Wasserstein distance of order 1 on the set of probability measures ${\mathcal{P}}({\mathcal{P}}({\mathsf{S}}_{t}))$ as follows (see Villani [42, Definition 6.1]):

[TABLE]

Note that since $\delta_{\Delta_{t}}$ is a Dirac measure, we have

[TABLE]

Since convergence in $W_{1}$ distance implies weak convergence (see Villani [42, Theorem 6.9]), it suffices to prove that

[TABLE]

for any $f\in C_{b}({\mathsf{S}}_{t})$ and for all $t$ . We prove this by induction on $t$ .

As $\{s_{i}^{N}(0)\}_{1\leq i\leq N}\sim\lambda_{0}^{\otimes N}$ , the claim is true for $t=0$ . We suppose that the claim holds for $t$ and consider $t+1$ . Fix any $g\in C_{b}({\mathsf{S}}_{t+1})$ . Then, we have

[TABLE]

We first prove that the expectation of the second term on the right-hand side (RHS) of (12) converges to [math] as $N\rightarrow\infty$ . To that end, define $F:{\mathcal{P}}({\mathsf{S}}_{t})\rightarrow\mathbb{R}$ as

[TABLE]

One can prove that $F\in C_{b}({\mathcal{P}}({\mathsf{S}}_{t}))$ . Indeed, suppose that $\Delta_{n}$ converges to $\Delta$ . Let us define

[TABLE]

Since $P^{\pi}_{t}$ is weakly continuous, one can prove that $l_{n}$ converges to $l$ continuously; that is, if $s_{n}$ converges to $s$ , then $l_{n}(s_{n})\rightarrow l(s)$ . By Langen [31, Theorem 3.5], we have $F(\Delta_{n})\rightarrow F(\Delta)$ , and so, $F\in C_{b}({\mathcal{P}}({\mathsf{S}}_{t}))$ . This implies that the expectation of the second term on the RHS of (12) converges to zero as ${\cal L}(\Delta_{t}^{(N)})\rightarrow\Delta_{t}$ weakly, by the induction hypothesis.

Now, let us write the expectation of the first term on the RHS of (12) as

[TABLE]

Then, by Budhiraja and Majumder [9, Lemma A.2], we have

[TABLE]

Therefore, the expectation of the first term on the RHS of (12) also converges to zero as $N\rightarrow\infty$ . Since $g$ is arbitrary, this completes the proof.\Halmos

Proposition 5.10 essentially says that in the infinite-population limit, the empirical distribution of the states under the mean-field policy converges to the deterministic measure flow ${\boldsymbol{\Delta}}$ . Since the transition probabilities $P_{t}(\,\cdot\,|s,a,\Delta)$ are continuous in $\Delta$ , the evolution of the state of a generic agent in the finite-agent game with sufficiently many agents and the evolution of the state in the mean-field game under policies ${\boldsymbol{\pi}}^{(N)}=(\pi,\ldots,\pi)$ and $\pi$ , respectively, should therefore be close. Hence, the distributions of the states in each problem should also be close, from which we obtain the following result.

Proposition 5.12

We have

[TABLE]

Proof 5.13

Proof. For each $t\geq 0$ , let us define

[TABLE]

Since for any permutation $\sigma$ of $\{1,\ldots,N\}$ we have

[TABLE]

the cost function at time $t$ can be written as

[TABLE]

Let $F:{\mathcal{P}}({\mathsf{S}}_{t})\rightarrow\mathbb{R}$ be defined as

[TABLE]

One can show that $F\in C_{b}({\mathcal{P}}({\mathsf{S}}_{t}))$ as $\pi_{t}$ is weakly continuous. Hence, by Proposition 5.10 we obtain

[TABLE]

Note that by Lemma 5.8, the discounted cost in the mean-field game can be written as

[TABLE]

Therefore, by (13) and the dominated convergence theorem, we obtain

[TABLE]

which completes the proof.\Halmos

In order to prove the approximation result, we have to prove that if the policy of some agent deviates from the mean-field equilibrium policy, then the corresponding cost of this agent should be close to the cost in the mean-field limit as in Proposition 5.12, for $N$ sufficiently large. However, note that this agent can choose different policies for each $N$ in place of the mean-field equilibrium policy. Since the transition probabilities and the one-stage cost functions are the same for all agents in the game model, it is sufficient to change the policy of Agent $1$ for each $N$ . To that end, let $\{{\tilde{\pi}}^{(N)}\}_{N\geq 1}\subset\tilde{\Pi}_{1}^{c}$ be an arbitrary sequence of policies for Agent $1$ ; that is, for each $N\geq 1$ and $t\geq 0$ , ${\tilde{\pi}}_{t}^{(N)}:{\mathsf{Y}}^{t+1}\rightarrow{\mathcal{P}}({\mathsf{A}})$ is weakly continuous. For each $N\geq 1$ , let $\bigl{\{}{\tilde{s}}_{i}^{N}(t)\bigr{\}}_{1\leq i\leq N}$ be the collection of states in the $N$ -person game under the policy $\tilde{{\boldsymbol{\pi}}}^{(N)}\coloneqq\{{\tilde{\pi}}^{(N)},\pi,\ldots,\pi\}$ . Define

[TABLE]

The following result states that, in the infinite-population limit, the law of the empirical distribution of the states at each time $t$ is insensitive to local deviations from the mean-field equilibrium policy.

Proposition 5.14

For all $t\geq 0$ , we have

[TABLE]

weakly ${\mathcal{P}}({\mathcal{P}}({\mathsf{S}}_{t}))$ , as $N\rightarrow\infty$ .

Proof 5.15

Proof. The proof can be done by slightly modifying the proof of Proposition 5.10, and therefore will not be included here. See the proof of Saldi et al. [38, Proposition 4.6].\Halmos

For each $N\geq 1$ , let $\{{\hat{s}}^{N}(t)\}_{t\geq 0}$ denote the state trajectory of the mean-field game under policy ${\tilde{\pi}}^{(N)}$ ; that is, ${\hat{s}}^{N}(t)$ evolves as follows:

[TABLE]

Recall that the cost function of this mean-field game is given by

[TABLE]

where the action configuration at each time $t\geq 0$ is generated according to the probability law

[TABLE]

Proposition 5.16

For any $t\geq 0$ , we have

[TABLE]

for any sequence $\{T_{N}\}\subset C_{b}({\mathsf{S}}_{t}\times{\mathcal{P}}({\mathsf{S}}_{t}))$ such that the family $\bigl{\{}T_{N}(s,\,\cdot\,):s\in{\mathsf{S}}_{t},N\geq 1)\bigr{\}}$ is equicontinuous and $\sup_{N\geq 1}\|T_{N}\|<\infty$ .

Proof 5.17

Proof. The proof of Proposition 5.16 is given in Appendix D.\Halmos

Using Proposition 5.16, we now prove the following theorem which is a key element in the proof of Theorem 5.5.

Theorem 5.18

Let $\{{\tilde{\pi}}^{(N)}\}_{N\geq 1}\subset\tilde{\Pi}_{1}^{c}$ be an arbitrary sequence of policies for Agent $1$ . Then, we have

[TABLE]

where $\hat{J}_{{\boldsymbol{\Delta}}}({\tilde{\pi}}^{(N)})$ is given in (14).

Proof 5.19

Proof. Fix any $t\geq 0$ and define

[TABLE]

We first prove that $\bigl{\{}T_{N,t}(s,\,\cdot\,):s\in{\mathsf{S}}_{t},N\geq 1\bigr{\}}$ satisfies the hypothesis in Proposition 5.16. For equicontinuity, note that for any $s\in{\mathsf{S}}_{t}$ and for any $(\Delta,\Delta^{\prime})\in{\mathcal{P}}({\mathsf{S}}_{t})^{2}$ , we have

[TABLE]

Since $\omega_{C_{t}}(r)\rightarrow 0$ as $r\rightarrow 0$ , the family $\bigl{\{}T_{N,t}(s,\,\cdot\,):s\in{\mathsf{S}}_{t},N\geq 1\bigr{\}}$ is equicontinuous. Moreover, $\sup_{N\geq 1}\|T_{N,t}\|<\infty$ .

Therefore, by Proposition 5.16, we have

[TABLE]

Since $t$ is arbitrary, the above result holds for all $t\geq 0$ . Then the theorem follows from the dominated convergence theorem.\Halmos

As a corollary of Propositions 5.6 and 5.12, and Theorem 5.18, we obtain the following result.

Corollary 5.20

We have

[TABLE]

Now, we are ready to prove the main result of this section.

Proof 5.21

Proof of Theorem 5.5 One can prove that for any policy ${\boldsymbol{\pi}}^{(N)}\in\tilde{{\bf\Pi}}^{(N)}$ , we have

[TABLE]

for each $i=1,\ldots,N$ (see the proof of Saldi et al. [38, Theorem 2.3]). Hence, it is sufficient to consider weakly continuous policies in ${\bf\Pi}^{(N)}$ to establish the existence of $\varepsilon$ -Nash equilibrium in the new model.

We first prove that for sufficiently large $N$ , we have

[TABLE]

for each $i=1,\ldots,N$ . As indicated earlier, since the transition probabilities and the one-stage cost functions are the same for all agents in the new game, it is sufficient to prove (15) for Agent $1$ only. Given $\epsilon>0$ , for each $N\geq 1$ , let ${\tilde{\pi}}^{(N)}\in\tilde{\Pi}_{1}^{c}$ be such that

[TABLE]

Then, by Corollary 5.20, we have

[TABLE]

Therefore, there exists $N(\varepsilon)$ such that for $N\geq N(\varepsilon)$ , we have

[TABLE]

The result then follows from Proposition 5.6.\Halmos

Remark 5.22

We note that, using similar ideas, the finite-horizon cost criterion

[TABLE]

can be handled with the same quantitative results. The only part that requires a verification different from the infinite-horizon case is the following result: $F_{t}^{(n)}$ and $J_{*,t}^{{\boldsymbol{\nu}}^{(n)}}$ converge continuously to $F_{t}$ and $J^{{\boldsymbol{\nu}}}_{*,t}$ , respectively. Note that, in the finite-horizon case, for each $n$ and $t<T$ , these functions are given by

[TABLE]

and

[TABLE]

Note that the discount factor $\beta$ is missing in the above equations. For $t=T$ , we have $F_{T}^{(n)}(z,a)=C_{T}^{{\boldsymbol{\nu}}^{(n)}}(z,a)$ and $F_{T}(z,a)=C_{T}^{{\boldsymbol{\nu}}}(z,a)$ . Since $c$ is continuous and $\nu_{T,1}^{(n)}$ weakly converges to $\nu_{T,1}$ , we have that $J^{{\boldsymbol{\nu}}^{(n)}}_{*,T}$ continuously converges to $J^{{\boldsymbol{\nu}}}_{*,T}$ by Bertsekas and Shreve [4, Proposition 7.32]. But this implies that $F_{T-1}^{(n)}$ continuously converges to $F_{T-1}$ , and so, $J^{{\boldsymbol{\nu}}^{(n)}}_{*,T-1}$ continuously converges to $J^{{\boldsymbol{\nu}}}_{*,T-1}$ again by Bertsekas and Shreve [4, Proposition 7.32]. Then, by the induction hypothesis, we can conclude that $F_{t}^{(n)}$ and $J_{*,t}^{{\boldsymbol{\nu}}^{(n)}}$ continuously converge to $F_{t}$ and $J^{{\boldsymbol{\nu}}}_{*,t}$ , respectively, for each $t\leq T$ . Therefore, Theorems 3.3 and 5.5 hold for the finite-horizon cost criterion under the same assumptions. Furthermore, if we start the mean-field game at time $\tau>0$ with initial measure $\mu_{\tau}$ , then the pair $\bigl{(}\{\pi_{t}\}_{\tau\leq t\leq T},\{\mu_{t}\}_{\tau\leq t\leq T}\bigr{)}$ in Theorem 3.3 is still a mean-field equilibrium for the sub-game.\Halmos

6 An Example

In this section, we consider a specific additive noise model to illustrate our results. In this model, the state and observation dynamics of a generic agent for the mean-field game are given respectively by

[TABLE]

where $x(t)\in{\mathsf{X}}$ , $y(t)\in{\mathsf{Y}}$ , $a(t)\in{\mathsf{A}}$ , $w(t)\in{\mathsf{W}}$ , and $v(t)\in{\mathsf{V}}$ . Here, we assume that ${\mathsf{X}}={\mathsf{Y}}={\mathsf{W}}={\mathsf{V}}=\mathbb{R}$ , ${\mathsf{A}}\subset\mathbb{R}$ , and $\{w(t)\}$ and $\{v(t)\}$ are sequences of i.i.d. standard normal random variables independent of each other. The one-stage cost function of a generic agent is given by

[TABLE]

for some measurable function $d:{\mathsf{X}}\times{\mathsf{A}}\times{\mathsf{X}}\rightarrow[0,\infty)$ .

This model is the infinite-population limit of the $N$ -agent game model with state and observation dynamics

[TABLE]

and the one-stage cost function

[TABLE]

For this model, Assumption 3 holds with $w(x)=1+x^{2}$ and $\alpha=\max\{1+\|f\|^{2},L\}$ under the following conditions: (i) ${\mathsf{A}}$ is compact, (ii) $d$ is continuous and bounded, (iii) $g$ is continuous, and $f$ is bounded and continuous, (iv) $\sup_{a\in{\mathsf{A}}}g^{2}(x,a)\leq Lx^{2}$ for some $L>0$ , (v) $h$ is continuous and bounded. Note that $\|f\|$ is defined as

[TABLE]

Indeed, we have

[TABLE]

where $q$ is the standard normal density and $m$ is the Lebesgue measure. Hence, Assumption 3-(e) holds. In order to verify Assumption 3-(b), suppose $(x_{n},a_{n},\mu_{n})\rightarrow(x,a,\mu)$ and let $g\in C_{b}({\mathsf{X}})$ . Then, we have

[TABLE]

since $g$ and $F$ are continuous, where the continuity of $F$ follows from Langen [31, Theorem 3.5]* and the fact that $f$ is bounded and continuous. Therefore, the transition probability $p(\,\cdot\,|x,a,\mu)$ is weakly continuous. Thus, Assumption 3-(b) holds. Note that Assumption 3-(f) holds if the initial distribution $\mu_{0}$ has a finite second moment. Assumption 3-(a) trivially holds since $d$ is bounded and continuous. Finally, we will verify Assumption 3-(c). Suppose $(x_{n},\mu_{n})\rightarrow(x,\mu)$ . Then, we have $r(dy|x_{n},\mu_{n})=q\bigl{(}y-H(x_{n},\mu_{n})\bigr{)}m(dy)$ and $r(dy|x,\mu)=q\bigl{(}y-H(x,\mu)\bigr{)}m(dy)$ . Since $q\bigl{(}y-H(x_{n},\mu_{n})\bigr{)}\rightarrow q\bigl{(}y-H(x,\mu)\bigr{)}$ as $n\rightarrow\infty$ for all $y\in{\mathsf{Y}}$ , by Scheffé’s theorem (see, e.g., Billingsley [5, Theorem 16.12]**) we have $r(\,\cdot\,|x_{n},\mu_{n})\rightarrow r(\,\cdot\,|x_{n},\mu_{n})$ as $n\rightarrow\infty$ in total variation norm. Thus, Assumption 3-(c) holds. Therefore, under (i)-(v), there exists a mean-field equilibrium for the mean-field game of this example.*

For the same model, Assumption 2-(a),(c) holds under the following conditions: (vi) $d(x,a,y)$ is (uniformly) Lipschitz in $y$ with Lipschitz constant $K_{d}$ , (vii) $f(x,a,y)$ is (uniformly) Lipschitz in $y$ with Lipschitz constant $K_{f}$ , (viii) $g$ is bounded and $\inf_{(x,a)\in{\mathsf{X}}\times{\mathsf{A}}}|g(x,a)|\eqqcolon\theta>0$ , and (ix) $H$ is only a function of $x$ .

Indeed, we have

[TABLE]

where $L_{d}\coloneqq\max\{\|d\|,K_{d}\}$ . Hence, $\omega_{c}(r)\rightarrow 0$ as $r\rightarrow 0$ . For $\omega_{p}$ , we have

[TABLE]

For any compact interval $K=[-k,k]\subset{\mathsf{X}}$ , we can upper bound (17) as follows:

[TABLE]

The last two integrals in the last expression go to zero (uniformly in $(x,a,\mu,\nu)$ ) as $k\rightarrow\infty$ , since $F$ and $g$ are bounded, and $\inf_{(x,a)\in{\mathsf{X}}\times{\mathsf{A}}}|g(x,a)|>0$ . For any $\varepsilon>0$ , let $K_{\varepsilon}=[-k_{\varepsilon},k_{\varepsilon}]\subset{\mathsf{X}}$ so that the sum of these integrals is less than $\varepsilon$ for all $(x,a,\mu,\nu)$ . Let $T_{\varepsilon}$ denote the Lipschitz seminorm of $q$ on $K_{\varepsilon}$ . Then, we have

[TABLE]

where $L_{f}\coloneqq\max\{\|f\|,K_{f}\}$ . Since $\varepsilon$ is arbitrary, we have $\omega_{p}(r)\rightarrow 0$ as $r\rightarrow 0$ . Thus, Assumption 5-(a) holds. Note that Assumption 5-(c) automatically holds as $H$ is only a function of $x$ .

To establish Assumption 5-(b), we can impose the following additional assumption, as we did in Remark 5.2. Suppose ${\boldsymbol{\mu}}$ is the measure-flow in mean-field equilibrium.

(b’)

For ${\boldsymbol{\mu}}\in\Xi$ , there a exists unique minimizer $a_{z}\in{\mathsf{A}}$ of

[TABLE]

for each $z\in{\mathsf{Z}}$ and for all $t\geq 0$ .

Under assumption (b’), one can prove that Assumption 5-(b) holds. Note that, by Remark 5.2, assumption (b’) is true if, for instance, $d(x,a,y)$ and $\varrho(x^{\prime}|x,a,\mu)$ are strictly convex in $a$ , where $\varrho$ is given by

[TABLE]

7 Conclusion

This paper has considered discrete-time partially observed mean-field games subject to infinite-horizon discounted cost, for Polish state, observation, and action spaces. Under mild conditions, the existence of a Nash equilibrium has been established for this game model using the conversion of partially observed Markov decision processes to fully observed Markov decision processes in the belief space and then using the dynamic programming principle. We have also established that the mean-field equilibrium policy, when used by each agent, constitutes a nearly Nash equilibrium for games with sufficiently many agents.

One interesting future direction of research to pursue is to study partially observed team problems of the mean-field type. In this case, one possible approach is to establish the global optimality of person-by-person optimal policies under some convexity assumptions and then use the results developed in this paper. Finally, partially observed mean-field games with average-cost and risk-sensitive optimality criteria are also worth studying. In particular, using the vanishing discount factor approach in MDP theory (i.e., with discount factor $\beta\rightarrow 1$ ), it might be possible to establish similar results for the average cost case.

Appendix

A Proof of Proposition 4.7

Fix any $(z,a)\in{\mathsf{Z}}\times{\mathsf{A}}$ and $t\geq 0$ . To ease the notation, let us write ${\boldsymbol{\mu}}^{(n)}\coloneqq{\boldsymbol{\mu}}^{{\boldsymbol{\nu}}^{(n)}}$ and ${\boldsymbol{\mu}}\coloneqq{\boldsymbol{\mu}}^{{\boldsymbol{\nu}}}$ . First note that, for all $t\geq 0$ , $\mu_{t}^{(n)}$ weakly converges to $\mu_{t}$ as $n\rightarrow\infty$ .

We will mimic the proof technique used in Feinberg et al. [18, Section 5]* to prove the result. To this end, we first prove the following lemma.*

Lemma 7.1

Fix any $(z,a)\in{\mathsf{Z}}\times{\mathsf{A}}$ . Then, for any $f\in C_{b}({\mathsf{X}})$ , we have

[TABLE]

In particular, if $f\equiv 1$ , then the above result implies that $H_{t}^{(n)}(\,\cdot\,|z,a)$ converges to $H_{t}(\,\cdot\,|z,a)$ in total variation norm.

Proof 7.2

Proof. We have

[TABLE]

The first term in the last expression converges to zero as $n\rightarrow\infty$ by Langen [31, Theorem 3.5] since $p(\,\cdot\,|x^{\prime},a,\mu_{t}^{(n)})$ weakly converges to $p(dx|x^{\prime},a,\mu_{t})$ and $\|r(\,\cdot\,|x,\mu_{t+1}^{(n)})-r(\,\cdot\,|x,\mu_{t+1})\|$ converges continuously to [math]. For the second term, define ${\cal F}\coloneqq\{f(\,\cdot\,)r(C|\,\cdot\,,\mu_{t+1}):C\in{\mathcal{B}}({\mathsf{Y}})\}$ . Observe that ${\cal F}$ is an equicontinuous family of functions. Indeed, let $x_{n}\rightarrow x$ in ${\mathsf{X}}$ . Then, we have

[TABLE]

Since ${\cal F}$ is also uniformly bounded, the second term in the last expression also goes to zero as $n\rightarrow\infty$ since $p(\,\cdot\,|x^{\prime},a,\mu_{t}^{(n)})$ weakly converges to $p(\,\cdot\,|x^{\prime},a,\mu_{t})$ .\Halmos

Let $\{f_{k}\}\subset C_{b}({\mathsf{X}})$ be the weak convergence determining class of functions in $C_{b}({\mathsf{X}})$ ; that is, $\mu_{n}$ weakly converges to $\mu$ in ${\mathcal{P}}({\mathsf{X}})$ if and only if $\lim_{n\rightarrow\infty}\mu_{n}(f_{k})=\mu(f_{k})$ for all $k$ .

Now, we prove that for any subsequence $\{F_{t}^{(n_{l})}(z,a,y)\}_{l\geq 1}$ of $\{F_{t}^{(n)}(z,a,y)\}_{n\geq 1}$ , there exists a further subsequence $\{F_{t}^{(n_{l_{m}})}(z,a,y)\}_{m\geq 1}$ such that $F_{t}^{(n_{l_{m}})}(z,a,y)$ weakly converges to $F_{t}(z,a,y)$ for $H_{t}(\,\cdot\,|z,a)$ -almost everywhere. Let us write the subsequence $\{F_{t}^{(n_{l})}(z,a,y)\}_{l\geq 1}$ as $\{F_{t}^{(l,0)}(z,a,y)\}_{l\geq 1}$ . Since, by Lemma 7.1

[TABLE]

$F_{t}^{(l,0)}(z,a,y)(f_{1})$ * converges in probability $H_{t}(\,\cdot\,|z,a)$ to $F_{t}(z,a,y)(f_{1})$ by Feinberg et al. [18, Theorem 5.2]**, and so, there is a subsequence $\{F_{t}^{(l,1)}(z,a,y)(f_{1})\}$ of $\{F_{t}^{(l,0)}(z,a,y)(f_{1})\}$ such that $F_{t}^{(l,1)}(z,a,y)(f_{1})$ converges to $F_{t}(z,a,y)(f_{1})$ $H_{t}(\,\cdot\,|z,a)$ -almost everywhere. Similarly, by Lemma 7.1*

[TABLE]

and so, $F_{t}^{(l,1)}(z,a,y)(f_{2})$ converges in $H_{t}(\,\cdot\,|z,a)$ -probability to $F_{t}(z,a,y)(f_{2})$ by Feinberg et al. [18, Theorem 5.2]**. Therefore, there is a subsequence $\{F_{t}^{(l,2)}(z,a,y)(f_{2})\}$ of $\{F_{t}^{(l,1)}(z,a,y)(f_{2})\}$ such that $F_{t}^{(l,2)}(z,a,y)(f_{2})$ converges to $F_{t}(z,a,y)(f_{2})$ $H_{t}(\,\cdot\,|z,a)$ -almost everywhere. Continuing in this manner, we obtain an array of sequences. Then, by Cantor’s diagonal argument, for all $k\geq 1$ , $F_{t}^{(m,m)}(z,a,y)(f_{k})$ converges to $F_{t}(z,a,y)(f_{k})$ $H_{t}(\,\cdot\,|z,a)$ -almost everywhere as $m\rightarrow\infty$ . This implies that $F_{t}^{(m,m)}(z,a,y)$ weakly converges to $F_{t}(z,a,y)$ $H_{t}(\,\cdot\,|z,a)$ -almost everywhere.

Now, we combine this result and convergence of $H_{t}^{(n)}(\,\cdot\,|z,a)$ to $H_{t}(\,\cdot\,|z,a)$ in total variation norm to complete the proof. By the portmanteau theorem, it is sufficient to prove that $\liminf_{n\rightarrow\infty}\eta_{t}^{(n)}(D|z,a)\geq\eta_{t}(D|z,a)$ for all $D$ open in ${\mathsf{Z}}$ . Suppose to the contrary that there exists an open set $D\subset{\mathsf{Z}}$ such that $\liminf_{n\rightarrow\infty}\eta_{t}^{(n)}(D|z,a)<\eta_{t}(D|z,a)$ . Then, there exists a subsequence $\{\eta_{t}^{(n_{k})}(D|z,a)\}$ of $\{\eta_{t}^{(n)}(D|z,a)\}$ such that $\eta_{t}^{(n_{k})}(D|z,a)\leq\eta_{t}(D|z,a)-\varepsilon$ for all $k$ . By the above, there exists a subsequence $\{F_{t}^{(n_{k_{l}})}(D|z,a)\}$ of $\{F_{t}^{(n_{k})}(D|z,a)\}$ such that $F_{t}^{(n_{k_{l}})}(z,a,y)$ weakly converges to $F_{t}(z,a,y)$ $H_{t}(\,\cdot\,|z,a)$ -almost everywhere. Since $D$ is open, we have

[TABLE]

Then by Feinberg et al. [18, Lemma 5.1(i)]* and the fact that $H_{t}^{(n_{k_{l}})}(\,\cdot\,|z,a)$ converges to $H_{t}(\,\cdot\,|z,a)$ in total variation norm, we have*

[TABLE]

which is a contradiction. Hence, we must have $\liminf_{n\rightarrow\infty}\eta_{t}^{(n)}(D|z,a)\geq\eta_{t}(D|z,a)$ for all $D$ open in ${\mathsf{Z}}$ .

B Proof of Proposition 5.6

Fix any $N\geq 1$ and ${\boldsymbol{\pi}}^{(N)}\in{\bf\Pi}^{(N)}$ . For each $t\geq 0$ , let $\hat{\mathbb{P}}_{t}$ denote the probability law of the states ${\bf s}^{N}(t)=(s^{N}_{1}(t),\ldots,s^{N}_{N}(t))$ and the actions ${\bf a}^{N}(t)=(a^{N}_{1}(t),\ldots,a^{N}_{N}(t))$ under ${\boldsymbol{\pi}}^{(N)}$ in the new game model. Similarly, let $\mathbb{P}_{t}$ denote the probability law of the states ${\bf x}^{N}(t)=(x^{N}_{1}(t),\ldots,x^{N}_{N}(t))$ , the observations ${\bf y}^{N}(k)=(y^{N}_{1}(k),\ldots,y^{N}_{N}(k))$ for $k=0,\ldots,t$ , and the actions ${\bf a}^{N}(t)=(a^{N}_{1}(t),\ldots,a^{N}_{N}(t))$ under ${\boldsymbol{\pi}}^{(N)}$ in the original finite agent game model. We prove that, for each $t\geq 0$ ,

[TABLE]

which implies that $\hat{J}_{i}^{N}({\boldsymbol{\pi}}^{(N)})=J_{i}^{N}({\boldsymbol{\pi}}^{(N)})$ for all $i=1,\ldots,N$ .

The claim trivially holds for $t=0$ . Suppose that the claim holds for $t$ and consider $t+1$ . For $t+1$ , let

[TABLE]

Define the following transition probability $Q_{t}$ on $\prod_{i=1}^{N}{\mathsf{S}}_{t+1}$ given $\prod_{i=1}^{N}{\mathsf{S}}_{t}\times\prod_{i=1}^{N}{\mathsf{A}}$ as:

[TABLE]

where $F_{N}({\bf s})\coloneqq\frac{1}{N}\sum_{i=1}^{N}\delta_{s_{i}}$ . Define also $G_{t+1}\coloneqq{\boldsymbol{A}}_{t+1}\times{\boldsymbol{B}}_{t+1}\times\ldots{\boldsymbol{B}}_{0}\times{\boldsymbol{D}}_{t+1}$ , $L_{t}\coloneqq{\boldsymbol{B}}_{t}\times\ldots{\boldsymbol{B}}_{0}$ , and $U_{t+1}\coloneqq{\boldsymbol{A}}_{t+1}\times{\boldsymbol{B}}_{t+1}\times{\boldsymbol{D}}_{t+1}$ . Then, we have

[TABLE]

where $d{\bf y}^{N}(k:0)\coloneqq(d{\bf y}^{N}(k),\ldots,d{\bf y}^{N}(0))$ . Hence, $\hat{\mathbb{P}}_{t+1}=\mathbb{P}_{t+1}$ . This implies that $\hat{J}_{i}^{N}({\boldsymbol{\pi}}^{(N)})=J_{i}^{N}({\boldsymbol{\pi}}^{(N)})$ for all $i=1,\ldots,N$ .

The second part of the proposition can be proved similarly, so we omit the details.

C Proof of Lemma 5.8

The claim trivially holds for $t=0$ . Suppose that the claim holds for $t$ and consider $t+1$ . For $t+1$ , let $G_{t+1}=A_{t+1}\times B_{t+1}\times\ldots B_{0}$ , where $A_{t+1}\in{\mathcal{B}}({\mathsf{X}})$ and $B_{k}\in{\mathcal{B}}({\mathsf{Y}})$ for $k=0,\ldots,t+1$ . Define also $L_{t}\coloneqq B_{t}\times\ldots B_{0}$ , and $U_{t+1}\coloneqq A_{t+1}\times B_{t+1}$ . Then, we have

[TABLE]

Since, $G_{t+1}$ is arbitrary, this completes the proof.

D Proof of Proposition 5.16

Fix any sequence $\{T_{N}\}_{N\geq 1}$ satisfying the hypothesis of the proposition. Fix any $t\geq 0$ and suppose that

[TABLE]

for any bounded sequence $\{g_{N}\}_{N\geq 1}\subset C_{b}({\mathsf{S}}_{t})$ . Given (19), we prove that

[TABLE]

Indeed, we have

[TABLE]

First, note that since $\{T_{N}(\,\cdot\,,\Delta_{t})\}_{N\geq 1}\subset C_{b}({\mathsf{S}}_{t})$ , we have

[TABLE]

by (19).

Now, let us consider the first term in (21). To that end, define ${\cal F}\coloneqq\bigl{\{}T_{N}(s,\,\cdot\,):s\in{\mathsf{S}}_{t},N\geq 1)\bigr{\}}$ . Note that ${\cal F}$ is a uniformly bounded and equicontinuous family of functions on ${\mathcal{P}}({\mathsf{S}}_{t})$ , and therefore

[TABLE]

as ${\cal L}(\tilde{\Delta}_{t}^{(N)})\rightarrow{\cal L}(\Delta_{t})$ weakly. Then, we have

[TABLE]

Hence, (19) implies (20) for any $t$ .

Now, we prove that (19) is true for all $t$ , which will complete the proof as (19) implies (20). Set $\sup_{N\geq 1}\|g_{N}\|\eqqcolon L<\infty$ and define

[TABLE]

For any $s\in{\mathsf{S}}_{t}$ and $(\Delta,\Delta^{\prime})\in{\mathcal{P}}({\mathsf{S}}_{t})^{2}$ , we have

[TABLE]

Since $\omega_{P_{t}}(r)\rightarrow 0$ as $r\rightarrow 0$ by (III), the family $\{l_{N,t}(s,\,\cdot\,):s\in{\mathsf{S}}_{t},N\geq 1\}$ is equicontinuous.

We prove (19) by induction on $t$ . The claim trivially holds for $t=0$ as ${\cal L}({\tilde{s}}_{1}^{N}(0))={\cal L}({\hat{s}}^{N}(0))=\lambda_{0}$ for all $N\geq 1$ . Suppose the claim holds for $t$ and consider $t+1$ . We can write

[TABLE]

Since the family $\{l_{N,t}\}_{N\geq 1}$ is equicontinuous and bounded, and (19) implies (20) at time $t$ , the last term converges to zero as $N\rightarrow\infty$ . This completes the proof.

Acknowledgments.

This research was supported in part by the U.S. Air Force Office of Scientific Research (AFOSR) under MURI grant FA9550-10-1-0573, and in part by the Office of Naval Research under (ONR) MURI grant N00014-16-1-2710 and grant N00014-12-1-0998.

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Adlakha et al. [2015] Adlakha, S., R. Johari, G.Y. Weintraub. 2015. Equilibria of dynamic games with many players: Existence, approximation, and market structure. Journal of Economic Theory 156 269–316.
2Aliprantis and Border [2006] Aliprantis, C.D., K.C. Border. 2006. Infinite Dimensional Analysis. Berlin, Springer, 3rd ed.
3Bensoussan et al. [2013] Bensoussan, A., J. Frehse, P. Yam. 2013. Mean Field Games and Mean Field Type Control Theory. Springer, New York.
4Bertsekas and Shreve [1978] Bertsekas, D. P., S. E. Shreve. 1978. Stochastic optimal control: The discrete time case. Academic Press New York.
5Billingsley [1995] Billingsley, P. 1995. Probability and Measure. 3rd ed. Wiley.
6Billingsley [1999] Billingsley, P. 1999. Convergence of Probability Measures. 2nd ed. New York: Wiley.
7Biswas [2015] Biswas, A. 2015. Mean field games with ergodic cost for discrete time markov processes. ar Xiv:1510.08968.
8Bogachev [2007] Bogachev, V.I. 2007. Measure Theory: Volume II. Springer.