Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions
Naci Saldi, Tamer Basar, and Maxim Raginsky

TL;DR
This paper proves the existence of Nash equilibria in partially observed mean-field games and demonstrates that these equilibria serve as good approximations in large-agent settings.
Contribution
It introduces a method to establish Nash equilibria in partially observed mean-field games using belief space transformation and dynamic programming.
Findings
Existence of Nash equilibria under mild conditions.
Mean-field equilibrium policies are approximate Nash equilibria in large-agent games.
Applicable to infinite-horizon discounted cost scenarios.
Abstract
Establishing the existence of Nash equilibria for partially observed stochastic dynamic games is known to be quite challenging, with the difficulties stemming from the noisy nature of the measurements available to individual players (agents) and the decentralized nature of this information. When the number of players is sufficiently large and the interactions among agents is of the mean-field type, one way to overcome this challenge is to investigate the infinite-population limit of the problem, which leads to a mean-field game. In this paper, we consider discrete-time partially observed mean-field games with infinite-horizon discounted cost criteria. Using the technique of converting the original partially observed stochastic control problem to a fully observed one on the belief space and the dynamic programming principle, we establish the existence of Nash equilibria for these game…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\NatBibNumeric\TheoremsNumberedBySection\EquationsNumberedBySection
\RUNAUTHOR
Saldi, Başar, and Raginsky \RUNTITLEPartially Observed Stochastic Games with Mean-Field Interactions
\TITLE
Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions
\ARTICLEAUTHORS\AUTHOR
Naci Saldi, Tamer Başar, and Maxim Raginsky \AFFCoordinated Science Laboratory, University of Illinois,Urbana, IL 61801-2307, USA.
{nsaldi,basar1,[email protected]}
\ARTICLEAUTHORS\AUTHOR
Naci Saldi \AFFDepartment of Natural and Mathematical Sciences, Ozyegin University, Cekmekoy, Istanbul, Turkey.
{[email protected]} \AUTHORTamer Başar, and Maxim Raginsky \AFFCoordinated Science Laboratory, University of Illinois,Urbana, IL 61801-2307, USA.
\ABSTRACT
Establishing the existence of Nash equilibria for partially observed stochastic dynamic games is known to be quite challenging, with the difficulties stemming from the noisy nature of the measurements available to individual players (agents) and the decentralized nature of this information. When the number of players is sufficiently large and the interactions among agents is of the mean-field type, one way to overcome this challenge is to investigate the infinite-population limit of the problem, which leads to a mean-field game. In this paper, we consider discrete-time partially observed mean-field games with infinite-horizon discounted cost criteria. Using the technique of converting the original partially observed stochastic control problem to a fully observed one on the belief space and the dynamic programming principle, we establish the existence of Nash equilibria for these game models under very mild technical conditions. Then, we show that the mean-field equilibrium policy, when adopted by each agent, forms an approximate Nash equilibrium for games with sufficiently many agents.
\KEYWORDS
Mean-field games, approximate Nash equilibrium, partially observed stochastic control. \MSCCLASS91A15, 91A13, 90C40, 90C39, 60J05 \ORMSCLASSPrimary: Games/group decisions, dynamic programming/optimal control, probability ; secondary: Stochastic, Markov, Markov processes
1 Introduction
In this paper, we consider discrete-time mean-field games with decentralized partial observation under infinite-horizon discounted-cost optimality criteria. This type of game models arise as the infinite population limit of finite-agent dynamic games of the mean-field type; that is, the interactions among agents are modeled by the mean-field term (i.e., the empirical distribution of their states), which affects both the agents’ individual costs, and their state and observation transition probabilities. Letting the number of agents go to infinity, the mean-field term converges to the distribution of a single generic agent. Hence, in the limiting case, a generic agent is faced with a single-agent stochastic control problem with a constraint on the distribution of the state at each time (i.e., mean-field game problem). The main goal in this class of problems is to establish the existence of a policy and a state distribution flow such that when the generic agent applies this policy, the resulting distribution of agent’s state is same as the state distribution flow. This last property is called the Nash certainty equivalence (NCE) principle (Huang et al. [28]). The purpose of this paper is to study the existence of such an equilibrium for a general class of mean-field game models with discounted-cost criteria under decentralized partial observation and to establish that the policy in the mean-field equilibrium constitutes a nearly Nash equilibrium for finite-agent games with sufficiently many agents.
Mean-field games have been introduced by Huang et al. [28] and Lasry and Lions [32] around the same time to establish the existence of approximate Nash equilibria for fully-observed non-cooperative differential games with a large number of identical agents. The main feature of this approach is to reduce the decentralized game problem to a centralized stochastic decision problem using the NCE principle. The equilibrium solution of this decision problem provides an almost Nash equilibrium when the number of agents is sufficiently large. Characterization of the solution entails a Fokker-Planck equation evolving forward in time and a Hamilton-Jacobi-Bellman equation evolving backward in time. We refer the reader to Huang et al. [27], Tembine et al. [41], Huang [25], Bensoussan et al. [3], Cardaliaguet [10], Carmona and Delarue [11], Gomes and Saúde [20], Moon and Başar [33] for studies of fully-observed continuous-time mean-field games with different models and cost functions, such as games with major-minor players, risk-sensitive games, games with Markov jump parameters, and LQG games.
In the literature relatively few results are available on partially-observed mean-field games. Indeed, our work appears to be the first one that studies discrete-time mean-field games under partial observations. Existing works have mostly studied the continuous-time setup, and analyses of continuous-time and discrete-time setups are quite different. Huang et al. [26] study a partially-observed continuous-time mean-field game with linear individual dynamics. Şen and Caines [13, 14] consider a continuous-time mean-field game with major-minor agents and nonlinear dynamics where the minor agents can partially observe the state of the major agent. Şen and Caines [16, 15] also develop a nonlinear filtering theory for McKean-Vlasov type stochastic differential equations that arise as the infinite population limit of the partially-observed differential game of the mean-field type. The nonlinear filtering equation is derived using Itô’s lemma for Banach space valued stochastic processes. Tang and Meng [40] study a continuous-time partially observed stochastic control problem of the mean-field type and establish a maximum principle to characterize the optimal control. Huang and Wang [24] consider a continuous-time mean-field game with linear individual dynamics where two types of partial information structure are considered: (i) agents cannot observe the white-noise which is common to all agents, (ii) agents can access the additive white-noise version of their own states.
The class of discrete-time mean-field games we consider in this paper are defined on a Polish state space and with infinite-horizon discounted-cost optimality criteria for the players, who have access to decentralized partial observation on their individual states. In such games, a generic agent is faced with a partially observed stochastic control problem under the NCE principle, which, as indicated earlier, states that the state distribution flow under an optimal decision rule should be the same as the mean-field term that appears in the state and observation dynamics as well as in the individual cost functions. In accordance with this, the classical techniques used to study partially observed optimal control problems are not sufficient to analyze mean-field games. To establish the existence of an equilibrium solution, we have to bring in the fixed-point approach that is used to obtain equilibria in classical game problems, along with the technique of converting partially observed optimal control problems to fully observed ones on the belief space and then employing the dynamic programming principle. The precise descriptions of the mean-field game and the finite-agent game problems are given in Sections 3 and 2, respectively. In Section 4 we prove the existence of a mean-field equilibrium. In Section 5 we establish that the mean-field equilibrium policy is approximately Nash for finite-agent games with sufficiently many agents. In Section 6 we illustrate our results by considering an example.
In Saldi et al. [38] (see also the abridged conference version [39]) we solved the fully-observed version of this problem under a similar set of assumptions on the system components. The techniques used in this paper to establish the existence of a mean field and an approximate Nash equilibrium are almost the same as in Saldi et al. [38] modulo some transformations of the original problems into equivalent ones for which we can still use the techniques in Saldi et al. [38]. However, as a result of these transformations, there are highly non-trivial differences between the proofs in this paper and in Saldi et al. [38]. For instance, as a result of fully observed reduction of partially observed optimal control problem in the mean field game, the dependence of the state transition probability of the fully observed model on the mean-field term is not explicit as in Saldi et al. [38]. Therefore, it is quite challenging to prove the weak continuity of this transition probability with respect to the mean-field term, which is a very crucial result in order to establish the existence of a mean-field equilibrium.
Notation. For a metric space , we let denote the set of all bounded continuous real functions on , and denote the set of all Borel probability measures on . For any -valued random element , denotes the distribution of . A sequence of measures on is said to converge weakly to a measure if for all . For any and measurable real function on , we define . For any subset of , we let and denote the boundary and complement of , respectively. The notation means that the random element has distribution . Unless otherwise specified, the term “measurable” will refer to Borel measurability.
2 Finite Player Game
We consider a discrete-time partially observed -agent stochastic game of mean-field type with a Polish state space , a Polish action space , and a Polish observation space . For every and every , let , , and denote the state, the action, and the observation of Agent at time , and let
[TABLE]
denote the empirical distribution of the states of agents at time , where is the Dirac measure at . The initial states are independent and identically distributed according to . For each , the current-observations and the next-states of agents are obtained randomly according to the conditional probability laws
[TABLE]
where the stochastic kernel denotes the transition probability law of the next state given the previous state-action pair and the empirical distribution of states, and the stochastic kernel denotes the transition probability law of the current observation given the current state and the empirical distribution of states. The measurable function is the one-stage cost function.
Define the history spaces and for , all endowed with product Borel -algebras. A policy for a generic agent is a sequence of stochastic kernels on given . The set of all policies for Agent is denoted by . Let be the set of policies in which only use the observations; that is, if for each . Let and . We let , denote the -tuple of policies for all the agents in the game. Under such -tuple of policies, the actions of agents at each time is obtained randomly according to the conditional probability law
[TABLE]
where is the observation-action history observed by Agent up to time . Note that agents can only use their local observations. Hence, it is a partially observed game model.
For Agent , the infinite-horizon discounted cost under the initial distribution and -tuple of policies is given by
[TABLE]
where E^{{\boldsymbol{\pi}}^{(N)}}\big{[}\cdot\big{]} denotes expectation with respect to the unique probability law induced by the above stochastic update rules and the initial state distribution on the infinite product of state, observation, and action spaces of all agents.
Definition 2.1
A policy constitutes a Nash equilibrium if
[TABLE]
for each , where .
Establishing the existence of Nash equilibria for the class of games formulated above is known to be difficult due to (almost) decentralized and noisy nature of the information structure of the problem. Indeed, even if the number of players is small, it is all but impossible to show even the existence of approximate Nash equilibria for these types of games. However, when the number of players is sufficiently large, one way to overcome this challenge is to introduce the infinite-population limit of the game described here. In this limiting case, we can model the empirical distributions of the state configurations as an exogenous state-measure flow, which should be consistent with the distribution of a generic agent (i.e., the NCE principle) by the law of large numbers. Hence, in the limiting case, a generic agent is faced with a mean-field game that will be introduced in the next section. Then one would expect that if each agent in the finite-agent game problem adopts the equilibrium policy in the infinite-population limit, the resulting policy will be an approximate Nash equilibrium for all sufficiently large . Therefore, by studying the infinite-population limit, which is easier to handle, one can obtain approximate Nash equilibrium for the original finite-agent game problem for which establishing the existence of a true Nash equilibrium is very difficult.
To that end, we slightly change the definition of Nash equilibrium in this model and adopt the following solution concept:
Definition 2.2
A policy is a Nash equilibrium if
[TABLE]
for each , and an -Nash equilibrium (for a given ) if
[TABLE]
for each .
Note that, according to this definition, the agents can only use their local observations to design their policies. Indeed, in practical applications, agents typically have access only to their local observations. Collecting all the observations in the entire system is intractable, particularly if the number of agents is large. Consequently, establishing that a mean-field policy is an approximate Nash equilibrium for the game under the assumption that the agents have access to full observation variables is not necessary in order to cover practically meaningful scenarios. It is sufficient to establish the existence of an approximate Nash equilibrium for the game with a local information structure. In addition, in the discrete-time mean field literature, it is common to establish the existence of approximate Nash equilibria with local (decentralized) information structures (see Adlakha et al. [1] Biswas [7]). This is true for continuous-time partially observed case as well (see N.Şen and Caines [34]).
In Section 5, we will show that the policy in the infinite-population equilibrium is an -Nash equilibrium (for a given ) when the number of agents is sufficiently large.
3 Partially observed mean-field games and mean-field equilibria
In this section we introduce a mean-field game that can be interpreted as the infinite-population limit of the game introduced in the preceding section. This mean-field game model is specified by
[TABLE]
where, as before, , , and are the Polish state, observation, and action spaces, respectively. The stochastic kernel denotes the transition probability and denotes the observation kernel. The measurable function is the one-stage cost function and is the distribution of the initial state.
Define the history spaces and for , all endowed with product Borel -algebras. A policy is a sequence of stochastic kernels on given . The set of all policies is denoted by . Partially observed mean-field games are not games in the classical sense. They are indeed single-agent partially observed stochastic control problems whose state distribution at each time step should satisfy some consistency condition. In other words, we have a single agent with partial observations and model the overall behavior of (a large population of) other agents by an exogenous state-measure flow with a given initial condition . This measure flow should also be consistent with the state distributions of this single agent when the agent acts optimally. The precise mathematical description of the problem is given as follows.
We let {\mathcal{M}}\coloneqq\bigl{\{}{\boldsymbol{\mu}}\in{\mathcal{P}}({\mathsf{X}})^{\infty}:\mu_{0}\text{ is fixed}\bigr{\}} be the set of all state-measure flows with a given initial condition . Given any measure flow , the probabilistic evolution of the states, observations, and actions is as follows
[TABLE]
where is the observation-action history up to time . According to the Ionescu Tulcea theorem (see, e.g., Hernández-Lerma and Lasserre [22]), an initial distribution on , a policy , and a state-measure flow define a unique probability measure on . The expectation with respect to is denoted by . A policy is said to be optimal for if
[TABLE]
where the infinite-horizon discounted cost of policy with measure flow and the discount factor is given by
[TABLE]
Using these definitions, we first define the set-valued mapping
[TABLE]
as . Conversely, we define a single-valued mapping
[TABLE]
as follows: given , the state-measure flow is constructed recursively as
[TABLE]
where denotes the conditional distribution of given under and . Using and , we now introduce the notion of an equilibrium for the mean-field game.
Definition 3.1
A pair is a mean-field equilibrium if and . In other words, the policy is optimal for the state-measure flow and is consistent with the state distributions of the agent when it acts optimally via .
In this section, the main goal is to establish the existence of a mean-field equilibrium. To that end, we impose the assumptions below on the components of the mean-field game model.
{assumption}
- (a)
The cost function is bounded and continuous.
- (b)
The stochastic kernel is weakly continuous in ; i.e., for all , and , weakly when .
- (c)
The observation kernel is continuous in with respect to total variation norm; i.e., for all and , in total variation norm when .
- (d)
is compact and is locally compact.
- (e)
There exist a constant and a continuous moment function (see Hernández-Lerma and Lasserre [22, Definition E.7]) such that
[TABLE]
- (f)
The initial probability measure satisfies
[TABLE]
Remark 3.2
We need Assumption 3-(c) in order to establish the continuity of the transition probability (i.e, ), that will be introduced in the next section, of the fully-observed reduction of the partially observed mean-field game model with respect to the weak topology. If this continuity condition holds under any other assumption on the observation kernel (for instance, under fully observed case; that is, ), then all the results in this paper are still true.\Halmos
The main result of this section is the existence of a mean-field equilibrium under Assumption 3. Later we will show that this mean-field equilibrium constitutes an approximate Nash-equilibrium for games with sufficiently many agents.
Theorem 3.3
Under Assumption 1, the mean-field game \bigl{(}{\mathsf{X}},{\mathsf{A}},{\mathsf{Y}},p,r,c,\mu_{0}\bigr{)} admits a mean-field equilibrium .
The proof of Theorem 3.3 is given Section 4. To establish the existence of an equilibrium, we use fully observed reduction of partially observed optimal control problems and the dynamic programming principle in addition to the fixed point approach that is commonly used in classical game problems.
4 Proof of Theorem 3.3
Note that, given any measure flow , the optimal control problem for the mean-field game reduces to one of finding an optimal policy for a partially observed Markov decision process (POMDP). Hence, before starting the proof of Theorem 3.3, we first review a few relevant results on POMDPs. To this end, fix any and consider the corresponding optimal control problem.
Let {\mathcal{P}}_{w}({\mathsf{X}})\coloneqq\bigl{\{}\mu\in{\mathcal{P}}({\mathsf{X}}):\mu(w)<\infty\bigr{\}}. It is known that any POMDP can be reduced to a (completely observable) MDP (see Yushkevich [43], Rhenius [37]), whose states are the posterior state distributions or beliefs of the observer; that is, the state at time is
[TABLE]
We call this equivalent MDP the belief-state MDP. Note that since under any policy by Assumption 1-(e),(f), we have almost everywhere. Therefore, the belief-state MDP has state space and action space , where is equipped with the Borel -algebra generated by the topology of weak convergence. The transition probabilities of the belief-state MDP can be constructed as follows (see also Hernández-Lerma [21]). Let denote the generic state variable for the belief-state MDP. Fix any . First consider the transition probability on given
[TABLE]
where . Let us disintegrate as follows
[TABLE]
Then, we define the mapping as
[TABLE]
Note that
[TABLE]
Then, can be written as
[TABLE]
The initial point for the belief-state MDP is ; that is, . Finally, for each , the one-stage cost function of the belief-state MDP is given by
[TABLE]
Hence, the belief-state MDP is a Markov decision process with the components
[TABLE]
For the belief-state MDP define the history spaces and , . A policy is a sequence of stochastic kernels on given . The set of all policies is denoted by . A Markov policy is a sequence of stochastic kernels on given . The set of Markov policies is denoted by . Let denote the discounted cost function of policy for initial point of the belief-state MDP. Notice that any history vector of the belief-state MDP is a function of the history vector of the POMDP. Let us write this relation as . Hence, for a policy , we can define a policy as
[TABLE]
Let us write this as a mapping from to : . It is straightforward to show that the cost functions and are the same. One can also prove that (see Yushkevich [43], Rhenius [37])
[TABLE]
and furthermore, that if is an optimal policy for belief-state MDP, then is optimal for the POMDP as well. Therefore, the optimal control problem for the mean-field game is equivalent to the optimal control of belief-state MDP.
Now, we derive the conditions satisfied by the components of the belief-state MDP under Assumption 1. Note first that where . Since is a moment function, each is tight (Hernández-Lerma and Lasserre [22, Proposition E.8]). Moreover, each is also closed since is continuous. Therefore, each is compact with respect to the weak topology. This implies that is a -compact Polish space. Define as
[TABLE]
Note that is a moment function on . Indeed, we have
[TABLE]
We also have
[TABLE]
Moreover, by Feinberg et al. [18, Theorem 3.6], is weakly continuous in for all . Therefore, the belief-state MDP satisfies the following conditions under Assumption 1.
- (i)
The cost functions are bounded and continuous.
- (ii)
The stochastic kernels are weakly continuous.
- (iii)
is compact and is -compact.
- (iv)
There exist a constant and a lower semi-continuous moment function such that
[TABLE]
- (v)
The initial probability measure satisfies
[TABLE]
Our approach to prove Theorem 3.3 can be summarized as follows: (i) first, we lift the partially observed stochastic control problem, that a generic agent is faced with, to a fully observed stochastic control problem (i.e., belief-state MDP) using the above mentioned results; (ii) we prove that the state transition probabilities of the belief-state MDP are continuous with respect to state measure flow of the original partially observed problem, (iii) finally, we use the technique in our paper Saldi et al. [38], which is developed to show the existence of mean-field equilibria for fully observed mean-field games, to finish the proof. The key step in our approach is (ii) in which we mimic the elegant proof technique that is established by Feinberg et al. [18] to prove the weak continuity of the transition probabilities of fully observed reduction of partially observed stochastic control problems.
We are now ready to prove Theorem 3.3. Define the mapping as follows:
[TABLE]
In other words, is the so-called ‘barycenter’ of , see, e.g., Phelps [36]. Using this definition, for any , we define the measure flow as follows:
[TABLE]
where for any , we let denote the marginal of on ; that is, . Let and be, respectively, the transition probabilities and one-stage cost functions of belief-state MDP induced by the measure flow . We let denote the discounted cost value function at time of this belief-state MDP; that is,
[TABLE]
Let J_{*}^{{\boldsymbol{\nu}}}\coloneqq\bigl{(}J^{{\boldsymbol{\nu}}}_{*,t}\bigr{)}_{t\geq 0}. To prove the existence of a mean-field equilibrium, we use the technique in our previous paper Saldi et al. [38] adopted from Jovanovic and Rosenthal [29], which enables us to transform the fixed point equation characterizing the mean-field equilibrium into a fixed point equation of a set-valued mapping from the set of state-action measure flows into itself. Then, using Kakutani’s fixed point theorem (Aliprantis and Border [2, Corollary 17.55]), we deduce the existence of a mean-field equilibrium.
Remark 4.1
We note that the technique used here to prove the existence of a mean-field equilibrium is very similar to the one in our previous paper Saldi et al. [38], in which we have studied fully-observed version of the same problem. However, there is a crucial difference in the details between this problem and the fully-observed one. In the fully-observed case, the transition probability is given by from which we can immediately deduce the continuity of the transition probability with respect to state-measure flow. However, in the partially-observed case, we do not have such an explicit analytical expression describing the relation between and from which we can deduce the same continuity result. Hence, we need to prove this highly non-trivial statement (unlike the fully-observed case) in order to use the technique in Saldi et al. [38]. Indeed, this is the key step here (step (ii) above) to prove the existence of a mean-field equilibrium. After we prove this result, the rest of the proof follows the same steps as in Saldi et al. [38]. However, for the sake of completeness, we give the full details of the proof, since we are using quite different notation as a result of fully-observed reduction of the original problem and the dependence of the transition probability of the belief-state MDP on the state-measure flow is not explicit as in the fully-observed case.\Halmos
As we mentioned above, we first transform the fixed point equation into a fixed point equation of a set-valued mapping from into itself. To that end, we define the product space in which is an element. Here, . Moreover, we equip with the following metric:
[TABLE]
where is chosen so that . For any , we define the *Bellman optimality operator * as
[TABLE]
In the MDP theory, gives the relation between value functions and . Moreover, given , it characterizes the optimal policy at time . It is standard to prove that is a contraction on with modulus for all , i.e., for all . Let us define the operator as
[TABLE]
It can be shown that is a contraction on with modulus :
[TABLE]
Since is a complete metric space, has a unique fixed point by the Banach fixed point theorem.
The following theorem is a known result in the theory of nonhomogeneous Markov decision processes (see Hinderer [23, Theorems 14.4 and 17.1]). For any given , it characterizes and the optimal policy of the belief-state MDP.
Theorem 4.2
For any , the collection of value functions is the unique fixed point of the operator . Furthermore, is optimal if and only if
[TABLE]
where \nu_{t}^{\varphi}={\cal L}\bigl{(}z(t),a(t)\bigr{)} under and .
We are now ready to define the above-mentioned set-valued map from into itself. To that end, for any , let us define the following sets:
[TABLE]
Note that the set characterizes the consistency of the mean-field term with the distribution of a generic agent, and the set characterizes optimality of the policy that is obtained by disintegrating the state-action measure-flow, for the mean-field term. The set-valued mapping is given as follows:
[TABLE]
We say that is a fixed point of if . The following proposition makes the connection between mean-field equilibria and the fixed points of , and so, transforms the fixed point equation into the fixed point equation .
Proposition 4.3
Suppose that has a fixed point . Construct a Markov policy for belief-state MDP by disintegrating each as , and let . Then the pair is a mean-field equilibrium.
Proof 4.4
Proof. Note that, since , we have \nu_{t}={\cal L}\bigl{(}z(t),a(t)\bigr{)} () for belief-state MDP under the policy and the measure flow . Then, for any , we have
[TABLE]
Since (8) is true for all , we have
[TABLE]
where denotes the conditional distribution of given under and . Hence, .
Since , the corresponding Markov policy satisfies (7) for . Therefore, by Theorem 4.2 and the fact that \nu_{t}={\cal L}\bigl{(}z(t),a(t)\bigr{)} () for belief-state MDP under the policy and the measure flow , is optimal for belief-state MDP induced by the measure flow (or, equivalently, ). Therefore, .\Halmos
By Proposition 4.3, it suffices to prove that has a fixed point in order to establish the existence of a mean-field equilibrium. To prove this, Kakutani’s fixed point theorem (see, e.g., Aliprantis and Border [2, Corollary 17.55]), which is a standard result to prove the existence of Nash equilibrium in classical game problems, is the right tool to use. In order to use Kakutani’s fixed point theorem, the space on which the set-valued mapping is defined should be convex and compact. However, the set in the definition of is not compact. But, we will prove that the image of under is contained in a convex and compact subset of . Hence, it is sufficient to consider this convex and compact set in the definition of . To that end, for each , define the set
[TABLE]
Since is a lower semi-continuous moment function, the set is compact with respect to the weak topology, see Hernández-Lerma and Lasserre [22, Proposition E.8, p. 187]. Let us define
[TABLE]
Since is compact, is tight. Furthermore, is closed with respect to the weak topology. Hence, is compact. Let , which is convex and compact with respect to the product topology.
Proposition 4.5
We have \Gamma\bigl{(}{\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}\bigr{)}\coloneqq\bigl{\{}{\boldsymbol{\nu}}^{\prime}:{\boldsymbol{\nu}}^{\prime}\in\Gamma({\boldsymbol{\nu}}),\text{ }{\boldsymbol{\nu}}\in{\mathcal{P}}({\mathsf{Z}}\times{\mathsf{A}})^{\infty}\bigr{\}}\subset\Xi.
Proof 4.6
Proof. Fix any . It is sufficient to prove that . Let . We prove by induction that for all . The claim trivially holds for as . Assume the claim holds for and consider . We have
[TABLE]
Hence, . \Halmos
By Proposition 4.5, it is sufficient to prove that has a fixed point . As in Jovanovic and Rosenthal [29, Theorem 1], one can prove that for any . Moreover, note that both and are convex, and thus their intersection is also convex. is a convex compact subset of a locally convex topological space , where denotes the set of all finite signed measures on . Hence, in order to deduce the existence of a fixed point of , we need to prove that it has a closed graph. Before stating this result, we prove the following proposition which is a key element of the proof as mentioned earlier (i.e., step (ii)). Its proof is given in Appendix A.
Proposition 4.7
Let in product topology. Then, for all , weakly converges to for all .
Using Proposition 4.7, we can now prove the following proposition.
Proposition 4.8
The graph of , i.e., the set
[TABLE]
is closed.
Proof 4.9
Proof. We note that modulo Proposition 4.7, the proof of the proposition is almost the same as the proof in Saldi et al. [38, Proposition 3.9] for the full state measurement case. However, for the sake of completeness, we give the proof here.
Let \bigl{\{}({\boldsymbol{\nu}}^{(n)},{\boldsymbol{\xi}}^{(n)})\bigr{\}}_{n\geq 1}\subset\Xi\times\Xi be such that for all and as for some . To prove is closed, it is sufficient to prove that .
Using Proposition 4.7, we first prove that ; that is, for all , we have
[TABLE]
For all and , we have
[TABLE]
Since in , weakly. Let . Then, by Langen [31, Theorem 3.5], we have
[TABLE]
since weakly and converges to continuously111Suppose , () are measurable functions on metric space . The sequence is said to converge to continuously if for any where . (see Langen [31, Theorem 3.5]). This implies that the measure in the right hand side of (9) converges weakly to . Therefore, we have
[TABLE]
from which we conclude that .
To complete the proof, it suffices to prove that . To that end, for each and , let us define the following functions
[TABLE]
By definition,
[TABLE]
Define also the following sets
[TABLE]
Since , we have
[TABLE]
To prove to , we need to show that
[TABLE]
First note that since both and are continuous, is closed. Moreover, is also closed as both and are continuous. One can also prove as in Saldi et al. [38, Proposition 3.10] that converges to continuously and converges to continuously, as .
For each , define the closed set B_{t}^{M}\coloneqq\bigl{\{}(z,a):F_{t}(z,a)\geq J^{{\boldsymbol{\nu}}}_{*,t}(z)+\epsilon(M)\bigr{\}}, where as . Since both and is continuous, we can choose so that for each . Note that by the monotone convergence theorem, we have
[TABLE]
This implies that
[TABLE]
*For any fixed , we prove that the limit of the second term in the last expression converges to zero. To that end, we first note that converges weakly to as when both measures are restricted to , as is closed and , see, e.g., Bogachev [8, Theorem 8.2.3]. Furthermore, since converges to continuously and converges to continuously, converges continuously to [math], which implies by Langen [31, Theorem 3.5] that *
[TABLE]
Therefore, we obtain
[TABLE]
where the last inequality follows from the Portmanteau theorem (see, e.g., Billingsley [6, Theorem 2.1]) and the fact that is closed. Hence, . Since is arbitrary, this is true for all . This means that . Therefore, . \Halmos
Recall that is a compact convex subset of the locally convex topological space . Furthermore, the graph of is closed by Proposition 4.8, and it takes nonempty convex values. Therefore, by Kakutani’s fixed point theorem (Aliprantis and Border [2, Corollary 17.55]), has a fixed point. Therefore, the pair is a mean field equilibrium, where and are constructed as in the statement of Proposition 4.3. This completes the proof of Theorem 3.3.
5 Approximation of Nash Equilibria
In this section, our aim is to show that the policy generated by the mean-field equilibrium, when adopted by each agent, is nearly Nash equilibrium for games with sufficiently large number of agents. Let denote the pair in the mean-field equilibrium. In addition to Assumption 3, we impose an additional assumption, which is stated below. To this end, let denote the bounded Lipschitz metric (see, e.g., Dudley [17, Proposition 11.3.2]**) on that metrizes the weak topology, and define the following moduli of continuity:
[TABLE]
{assumption}
- (a)
* and as .*
- (b)
For each , is deterministic; that is, for some measurable function , and weakly continuous.
- (c)
The observation kernel does not depend on the mean-field term.
Remark 5.1
Note that, if the state transition probability is independent of the mean-field term, then Assumption 5-(a) for is always true. Indeed, in that case, we have for all .\Halmos
Remark 5.2
One way to establish Assumption 5-(b) is as follows. Suppose that, for , there exists a unique minimizer of
[TABLE]
for each and for all . In addition, suppose that () in (3) is continuous. Note that uniqueness conditions analogous to (10) are quite common in mean field literature (see, e.g., Gomes et al. [19, Assuption 4], Şen and Caines [16, Assumption A5], Huang et al. [28, Assumption H5], N.Şen and Caines [34, Assumption A9]).
Under the condition of unique minimizer to (10), one can prove that the policy in Proposition 4.3 is deterministic and weakly continuous (see Saldi et al. [38, Section 5]). Indeed, fix any and consider the policy at time in . By (10), we must have for some which minimizes of the above form; that is, for all . If is continuous, then is also continuous. Hence, in order to prove the assertion, it is sufficient to prove that is continuous. Suppose in . Note that is continuous. Therefore, every accumulation point of the sequence must be a minimizer for . Since there exists a unique minimizer of , the set of all accumulation points of must be . This implies that converges to since is compact. Hence, is continuous.
Recall that the mean-field equilibrium policy is given by
[TABLE]
Hence, is also a deterministic policy as is a deterministic function. The function can be given recursively by () in (3) and the policy . Since is continuous for all and is also weakly continuous, we can conclude that the mean-field policy is deterministic and weakly continuous. Hence, Assumption 5-(b) holds.
For instance, we can prove existence of a unique minimizer to (10) and the continuity of for all under the following conditions on the system components. Suppose that , , and is convex. In addition, suppose that and , where denotes the Lebesgue measure. Assume that both and are continuous and bounded, and and are strictly convex in . Then we have
[TABLE]
where is given by
[TABLE]
Similarly, we have
[TABLE]
where is given by
[TABLE]
One can prove that is continuous. Hence, is also continuous by [5, Theorem 16.2]. To show uniqueness of a minimizer to (10), note that
[TABLE]
where
[TABLE]
Hence, for any , (10) can be written as
[TABLE]
Since and are strictly convex in , the last expression is also strictly convex in . Hence, there exists a unique minimizer for (10).\halmos
For , let . Then, for each , define as
[TABLE]
where . Let . Note that is a weakly continuous stochastic kernel on given . Indeed, and are equivalent because, for all , we have
[TABLE]
Hence, is also a mean-field equilibrium. In the sequel, we use to prove the approximation result. The reason for passing from to is that the latter policy becomes Markov in the equivalent game model that will be introduced in the proof of Theorem 5.5. Thus, we can use the proof technique in our previous paper (Saldi et al. [38]**) to show the existence of an approximate Nash equilibrium.
Before stating and proving the main result of this section on approximate Nash equilibrium property of the mean field equilibrium for the finite population case (Theorem 5.5), we discuss (and establish under some conditions) the uniqueness of the mean-field equilibrium. This entails a monotonicity condition similar to the monotonicity condition introduced by Lasry and Lions [32]**. {assumption}
- (U1)
Uniqueness condition in (10) holds for any mean-field equilibrium.
- (U2)
The state transition probability and the observation kernel do not depend on the mean-field term.
- (U3)
For any and in , we have the following monotonicity condition:
[TABLE]
We note that Assumption 5 is exactly the discrete-time counterpart of the Assumption (U) introduced in Lacker [30], Carmona and Lacker [12]**. Recall that Assumption 5-(U1) is true under the strict convexity assumptions introduced in Remark 5.2.
Theorem 5.3
Under Assumption 5, there exists at most one solution of the mean-field equilibrium. Furthermore, if Assumption 3 holds, then there exists a unique mean-field equilibrium.
Proof 5.4
Proof. Suppose that and are two distinct mean-field equilibria. Note that, under Assumption 5-(U1), the policies in mean-field equilibria are deterministic; that is, and , and they are unique optimal deterministic control policies given the measure flows and . In addition, by Assumption 5-(U2), the transition probability in fully-observed reduction of the POMDP in mean-field equilibrium does not depend on the mean-field term. Hence, by Assumption 5-(U3), we have the following inequality
[TABLE]
Since and , we have
[TABLE]
Recall that and are unique optimal deterministic control policies given the measure flows and . Therefore, , and so, since the transition probability in fully-observed problem does not depend on mean-field term. This completes the proof.\Halmos
The following theorem is the main result of this section, which states that the policy , where is repeated times, is an -Nash equilibrium for sufficiently large .
Theorem 5.5
For any , there exists such that for , the policy is an -Nash equilibrium for the game with agents.
Proof of Theorem 5.5
Note that the policy in the mean-field equilibrium is not necessarily Markovian, which makes the joint process of the state, observation, and mean-field term non-Markov. To prove Theorem 5.5, we will first construct an equivalent game model whose states are the state of the original model plus the current and past observations. In this new model, the mean-field equilibrium policy automatically becomes Markov. Then, we will use the proof technique in our previous paper Saldi et al. [38]* to show the existence of an approximate Nash equilibrium.*
This new game model is specified by
[TABLE]
where, for each ,
[TABLE]
and are the Polish state and action spaces at time , respectively. The stochastic kernel is defined as:
[TABLE]
where , (), , and is the marginal of on . Indeed, is the controlled transition probability of next state-observation pair, current observation, and past observations \bigl{(}x(t+1),y(t+1),y(t),\ldots,y(0)\bigr{)} given the current state-observation pair and past observations \bigl{(}x(t),y(t),y(t-1),\ldots,y(0)\bigr{)} in the original mean-field game. For each , the one-stage cost function is defined as:
[TABLE]
Finally, the initial measure is given by , where .
Suppose that Assumptions 3 and 5 hold. For each , let denote the bounded Lipschitz metric on 222The product metric on is assumed to be the sum of the metrics of the components in the product space., and define the following moduli of continuity:
[TABLE]
Then, for each , the following are satisfied:
- (I)
The one-stage cost function is bounded and continuous.
- (II)
The stochastic kernel is weakly continuous.
- (III)
* and as .*
It is straightforward to prove that (I) and (II) hold since is continuous, is weakly continuous, and is continuous in total variation norm. For (III), for each , fix any and , and for any , define where . Then, we have
[TABLE]
Since is arbitrary and , we have as by Assumption 5-(a). Similarly, we can also prove that as .
Recall the set of policies in the original mean-field game. Let be the set of policies in which only use the observations; that is, if for each . Note that is a subset of the set of Markov policies in the new model. For any measure flow , where , we denote by the infinite-horizon discounted-cost of the policy in this new model.
Similar to Section 2, we also define the corresponding agent game as follows. We have the Polish state spaces and action space . For every and every , let and denote the state and the action of Agent at time , and let
[TABLE]
denote the empirical distribution of the state configuration at time . The initial states are independent and identically distributed according to , and, for each , the next-state configuration is generated at random according to the probability laws
[TABLE]
Recall that denotes the set of policies that only use local observations for Agent in the original game. Note that is an admissible class of policies for the new model. Indeed, policies in are Markov for this new model since they partly use the state information. We let denote the set of all policies in for Agent that are weakly continuous; that is, if for all , is continuous when is endowed with the weak topology.
For Agent , the infinite-horizon discounted cost under the initial distribution and -tuple of policies is denoted as .
The following proposition makes the connection between this new model and the original model.
Proposition 5.6
For any , , and , we have . Similarly, for any and measure flow , we have where .
Proof 5.7
Proof. The proof of the proposition is given in Appendix B.\Halmos
By Proposition 5.6, in the remainder of this section we consider the new game model in place of the original one. Then, we use the same technique as in our previous paper Saldi et al. [38]* to prove the approximation result since the policy in the mean-field equilibrium is Markov for this new model. However, as the state space in this new model is expanding at each time step, there will be some differences between the current proof and the proof in Saldi et al. [38, Section 4]**. Therefore, for the sake of completeness, we give the full details of the proof.*
Define the measure flow as follows: , where denotes the probability law of in the original mean-field game under the policy in the mean-field equilibrium. For each , define the stochastic kernel on given as
[TABLE]
Since is assumed to be weakly continuous, is also weakly continuous in . In the sequel, to ease the notation, we will also write as .
Lemma 5.8
Measure flow satisfies
[TABLE]
Proof 5.9
Proof. The proof of the lemma is given in Appendix C.\Halmos
For each , let \bigl{\{}s_{i}^{N}(t)\bigr{\}}_{1\leq i\leq N} denote the state configuration at time in the -person game under the policy . Define the empirical distribution
[TABLE]
Proposition 5.10
For all , we have
[TABLE]
weakly in , as .
Proof 5.11
Proof. It is known that weak topology on can be metrized using the following metric:
[TABLE]
where is an appropriate sequence of real continuous and bounded functions on such that for all (see Parthasarathy [35, Theorem 6.6, p. 47]). Define the Wasserstein distance of order 1 on the set of probability measures as follows (see Villani [42, Definition 6.1]):
[TABLE]
Note that since is a Dirac measure, we have
[TABLE]
Since convergence in distance implies weak convergence (see Villani [42, Theorem 6.9]), it suffices to prove that
[TABLE]
for any and for all . We prove this by induction on .
As , the claim is true for . We suppose that the claim holds for and consider . Fix any . Then, we have
[TABLE]
We first prove that the expectation of the second term on the right-hand side (RHS) of (12) converges to [math] as . To that end, define as
[TABLE]
One can prove that . Indeed, suppose that converges to . Let us define
[TABLE]
Since is weakly continuous, one can prove that converges to continuously; that is, if converges to , then . By Langen [31, Theorem 3.5], we have , and so, . This implies that the expectation of the second term on the RHS of (12) converges to zero as weakly, by the induction hypothesis.
Now, let us write the expectation of the first term on the RHS of (12) as
[TABLE]
Then, by Budhiraja and Majumder [9, Lemma A.2], we have
[TABLE]
Therefore, the expectation of the first term on the RHS of (12) also converges to zero as . Since is arbitrary, this completes the proof.\Halmos
Proposition 5.10 essentially says that in the infinite-population limit, the empirical distribution of the states under the mean-field policy converges to the deterministic measure flow . Since the transition probabilities are continuous in , the evolution of the state of a generic agent in the finite-agent game with sufficiently many agents and the evolution of the state in the mean-field game under policies and , respectively, should therefore be close. Hence, the distributions of the states in each problem should also be close, from which we obtain the following result.
Proposition 5.12
We have
[TABLE]
Proof 5.13
Proof. For each , let us define
[TABLE]
Since for any permutation of we have
[TABLE]
the cost function at time can be written as
[TABLE]
Let be defined as
[TABLE]
One can show that as is weakly continuous. Hence, by Proposition 5.10 we obtain
[TABLE]
Note that by Lemma 5.8, the discounted cost in the mean-field game can be written as
[TABLE]
Therefore, by (13) and the dominated convergence theorem, we obtain
[TABLE]
which completes the proof.\Halmos
In order to prove the approximation result, we have to prove that if the policy of some agent deviates from the mean-field equilibrium policy, then the corresponding cost of this agent should be close to the cost in the mean-field limit as in Proposition 5.12, for sufficiently large. However, note that this agent can choose different policies for each in place of the mean-field equilibrium policy. Since the transition probabilities and the one-stage cost functions are the same for all agents in the game model, it is sufficient to change the policy of Agent for each . To that end, let be an arbitrary sequence of policies for Agent ; that is, for each and , is weakly continuous. For each , let \bigl{\{}{\tilde{s}}_{i}^{N}(t)\bigr{\}}_{1\leq i\leq N} be the collection of states in the -person game under the policy . Define
[TABLE]
The following result states that, in the infinite-population limit, the law of the empirical distribution of the states at each time is insensitive to local deviations from the mean-field equilibrium policy.
Proposition 5.14
For all , we have
[TABLE]
weakly , as .
Proof 5.15
Proof. The proof can be done by slightly modifying the proof of Proposition 5.10, and therefore will not be included here. See the proof of Saldi et al. [38, Proposition 4.6].\Halmos
For each , let denote the state trajectory of the mean-field game under policy ; that is, evolves as follows:
[TABLE]
Recall that the cost function of this mean-field game is given by
[TABLE]
where the action configuration at each time is generated according to the probability law
[TABLE]
Proposition 5.16
For any , we have
[TABLE]
for any sequence such that the family \bigl{\{}T_{N}(s,\,\cdot\,):s\in{\mathsf{S}}_{t},N\geq 1)\bigr{\}} is equicontinuous and .
Proof 5.17
Proof. The proof of Proposition 5.16 is given in Appendix D.\Halmos
Using Proposition 5.16, we now prove the following theorem which is a key element in the proof of Theorem 5.5.
Theorem 5.18
Let be an arbitrary sequence of policies for Agent . Then, we have
[TABLE]
where is given in (14).
Proof 5.19
Proof. Fix any and define
[TABLE]
We first prove that \bigl{\{}T_{N,t}(s,\,\cdot\,):s\in{\mathsf{S}}_{t},N\geq 1\bigr{\}} satisfies the hypothesis in Proposition 5.16. For equicontinuity, note that for any and for any , we have
[TABLE]
Since as , the family \bigl{\{}T_{N,t}(s,\,\cdot\,):s\in{\mathsf{S}}_{t},N\geq 1\bigr{\}} is equicontinuous. Moreover, .
Therefore, by Proposition 5.16, we have
[TABLE]
Since is arbitrary, the above result holds for all . Then the theorem follows from the dominated convergence theorem.\Halmos
As a corollary of Propositions 5.6 and 5.12, and Theorem 5.18, we obtain the following result.
Corollary 5.20
We have
[TABLE]
Now, we are ready to prove the main result of this section.
Proof 5.21
Proof of Theorem 5.5 One can prove that for any policy , we have
[TABLE]
for each (see the proof of Saldi et al. [38, Theorem 2.3]). Hence, it is sufficient to consider weakly continuous policies in to establish the existence of -Nash equilibrium in the new model.
We first prove that for sufficiently large , we have
[TABLE]
for each . As indicated earlier, since the transition probabilities and the one-stage cost functions are the same for all agents in the new game, it is sufficient to prove (15) for Agent only. Given , for each , let be such that
[TABLE]
Then, by Corollary 5.20, we have
[TABLE]
Therefore, there exists such that for , we have
[TABLE]
The result then follows from Proposition 5.6.\Halmos
Remark 5.22
We note that, using similar ideas, the finite-horizon cost criterion
[TABLE]
can be handled with the same quantitative results. The only part that requires a verification different from the infinite-horizon case is the following result: and converge continuously to and , respectively. Note that, in the finite-horizon case, for each and , these functions are given by
[TABLE]
and
[TABLE]
Note that the discount factor is missing in the above equations. For , we have and . Since is continuous and weakly converges to , we have that continuously converges to by Bertsekas and Shreve [4, Proposition 7.32]. But this implies that continuously converges to , and so, continuously converges to again by Bertsekas and Shreve [4, Proposition 7.32]. Then, by the induction hypothesis, we can conclude that and continuously converge to and , respectively, for each . Therefore, Theorems 3.3 and 5.5 hold for the finite-horizon cost criterion under the same assumptions. Furthermore, if we start the mean-field game at time with initial measure , then the pair \bigl{(}\{\pi_{t}\}_{\tau\leq t\leq T},\{\mu_{t}\}_{\tau\leq t\leq T}\bigr{)} in Theorem 3.3 is still a mean-field equilibrium for the sub-game.\Halmos
6 An Example
In this section, we consider a specific additive noise model to illustrate our results. In this model, the state and observation dynamics of a generic agent for the mean-field game are given respectively by
[TABLE]
where , , , , and . Here, we assume that , , and and are sequences of i.i.d. standard normal random variables independent of each other. The one-stage cost function of a generic agent is given by
[TABLE]
for some measurable function .
This model is the infinite-population limit of the -agent game model with state and observation dynamics
[TABLE]
and the one-stage cost function
[TABLE]
For this model, Assumption 3 holds with and under the following conditions: (i) is compact, (ii) is continuous and bounded, (iii) is continuous, and is bounded and continuous, (iv) for some , (v) is continuous and bounded. Note that is defined as
[TABLE]
Indeed, we have
[TABLE]
where is the standard normal density and is the Lebesgue measure. Hence, Assumption 3-(e) holds. In order to verify Assumption 3-(b), suppose and let . Then, we have
[TABLE]
since and are continuous, where the continuity of follows from Langen [31, Theorem 3.5]* and the fact that is bounded and continuous. Therefore, the transition probability is weakly continuous. Thus, Assumption 3-(b) holds. Note that Assumption 3-(f) holds if the initial distribution has a finite second moment. Assumption 3-(a) trivially holds since is bounded and continuous. Finally, we will verify Assumption 3-(c). Suppose . Then, we have r(dy|x_{n},\mu_{n})=q\bigl{(}y-H(x_{n},\mu_{n})\bigr{)}m(dy) and r(dy|x,\mu)=q\bigl{(}y-H(x,\mu)\bigr{)}m(dy). Since q\bigl{(}y-H(x_{n},\mu_{n})\bigr{)}\rightarrow q\bigl{(}y-H(x,\mu)\bigr{)} as for all , by Scheffé’s theorem (see, e.g., Billingsley [5, Theorem 16.12]**) we have as in total variation norm. Thus, Assumption 3-(c) holds. Therefore, under (i)-(v), there exists a mean-field equilibrium for the mean-field game of this example.*
For the same model, Assumption 2-(a),(c) holds under the following conditions: (vi) is (uniformly) Lipschitz in with Lipschitz constant , (vii) is (uniformly) Lipschitz in with Lipschitz constant , (viii) is bounded and , and (ix) is only a function of .
Indeed, we have
[TABLE]
where . Hence, as . For , we have
[TABLE]
For any compact interval , we can upper bound (17) as follows:
[TABLE]
The last two integrals in the last expression go to zero (uniformly in ) as , since and are bounded, and . For any , let so that the sum of these integrals is less than for all . Let denote the Lipschitz seminorm of on . Then, we have
[TABLE]
where . Since is arbitrary, we have as . Thus, Assumption 5-(a) holds. Note that Assumption 5-(c) automatically holds as is only a function of .
To establish Assumption 5-(b), we can impose the following additional assumption, as we did in Remark 5.2. Suppose is the measure-flow in mean-field equilibrium.
- (b’)
For , there a exists unique minimizer of
[TABLE]
for each and for all .
Under assumption (b’), one can prove that Assumption 5-(b) holds. Note that, by Remark 5.2, assumption (b’) is true if, for instance, and are strictly convex in , where is given by
[TABLE]
7 Conclusion
This paper has considered discrete-time partially observed mean-field games subject to infinite-horizon discounted cost, for Polish state, observation, and action spaces. Under mild conditions, the existence of a Nash equilibrium has been established for this game model using the conversion of partially observed Markov decision processes to fully observed Markov decision processes in the belief space and then using the dynamic programming principle. We have also established that the mean-field equilibrium policy, when used by each agent, constitutes a nearly Nash equilibrium for games with sufficiently many agents.
One interesting future direction of research to pursue is to study partially observed team problems of the mean-field type. In this case, one possible approach is to establish the global optimality of person-by-person optimal policies under some convexity assumptions and then use the results developed in this paper. Finally, partially observed mean-field games with average-cost and risk-sensitive optimality criteria are also worth studying. In particular, using the vanishing discount factor approach in MDP theory (i.e., with discount factor ), it might be possible to establish similar results for the average cost case.
Appendix
A Proof of Proposition 4.7
Fix any and . To ease the notation, let us write and . First note that, for all , weakly converges to as .
We will mimic the proof technique used in Feinberg et al. [18, Section 5]* to prove the result. To this end, we first prove the following lemma.*
Lemma 7.1
Fix any . Then, for any , we have
[TABLE]
In particular, if , then the above result implies that converges to in total variation norm.
Proof 7.2
Proof. We have
[TABLE]
The first term in the last expression converges to zero as by Langen [31, Theorem 3.5] since weakly converges to and converges continuously to [math]. For the second term, define . Observe that is an equicontinuous family of functions. Indeed, let in . Then, we have
[TABLE]
Since is also uniformly bounded, the second term in the last expression also goes to zero as since weakly converges to .\Halmos
Let be the weak convergence determining class of functions in ; that is, weakly converges to in if and only if for all .
Now, we prove that for any subsequence of , there exists a further subsequence such that weakly converges to for -almost everywhere. Let us write the subsequence as . Since, by Lemma 7.1
[TABLE]
* converges in probability to by Feinberg et al. [18, Theorem 5.2]**, and so, there is a subsequence of such that converges to -almost everywhere. Similarly, by Lemma 7.1*
[TABLE]
and so, converges in -probability to by Feinberg et al. [18, Theorem 5.2]**. Therefore, there is a subsequence of such that converges to -almost everywhere. Continuing in this manner, we obtain an array of sequences. Then, by Cantor’s diagonal argument, for all , converges to -almost everywhere as . This implies that weakly converges to -almost everywhere.
Now, we combine this result and convergence of to in total variation norm to complete the proof. By the portmanteau theorem, it is sufficient to prove that for all open in . Suppose to the contrary that there exists an open set such that . Then, there exists a subsequence of such that for all . By the above, there exists a subsequence of such that weakly converges to -almost everywhere. Since is open, we have
[TABLE]
Then by Feinberg et al. [18, Lemma 5.1(i)]* and the fact that converges to in total variation norm, we have*
[TABLE]
which is a contradiction. Hence, we must have for all open in .
B Proof of Proposition 5.6
Fix any and . For each , let denote the probability law of the states and the actions under in the new game model. Similarly, let denote the probability law of the states , the observations for , and the actions under in the original finite agent game model. We prove that, for each ,
[TABLE]
which implies that for all .
The claim trivially holds for . Suppose that the claim holds for and consider . For , let
[TABLE]
Define the following transition probability on given as:
[TABLE]
where . Define also , , and . Then, we have
[TABLE]
where . Hence, . This implies that for all .
The second part of the proposition can be proved similarly, so we omit the details.
C Proof of Lemma 5.8
The claim trivially holds for . Suppose that the claim holds for and consider . For , let , where and for . Define also , and . Then, we have
[TABLE]
Since, is arbitrary, this completes the proof.
D Proof of Proposition 5.16
Fix any sequence satisfying the hypothesis of the proposition. Fix any and suppose that
[TABLE]
for any bounded sequence . Given (19), we prove that
[TABLE]
Indeed, we have
[TABLE]
First, note that since , we have
[TABLE]
by (19).
Now, let us consider the first term in (21). To that end, define {\cal F}\coloneqq\bigl{\{}T_{N}(s,\,\cdot\,):s\in{\mathsf{S}}_{t},N\geq 1)\bigr{\}}. Note that is a uniformly bounded and equicontinuous family of functions on , and therefore
[TABLE]
as weakly. Then, we have
[TABLE]
Hence, (19) implies (20) for any .
Now, we prove that (19) is true for all , which will complete the proof as (19) implies (20). Set and define
[TABLE]
For any and , we have
[TABLE]
Since as by (III), the family is equicontinuous.
We prove (19) by induction on . The claim trivially holds for as for all . Suppose the claim holds for and consider . We can write
[TABLE]
Since the family is equicontinuous and bounded, and (19) implies (20) at time , the last term converges to zero as . This completes the proof.
Acknowledgments.
This research was supported in part by the U.S. Air Force Office of Scientific Research (AFOSR) under MURI grant FA9550-10-1-0573, and in part by the Office of Naval Research under (ONR) MURI grant N00014-16-1-2710 and grant N00014-12-1-0998.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Adlakha et al. [2015] Adlakha, S., R. Johari, G.Y. Weintraub. 2015. Equilibria of dynamic games with many players: Existence, approximation, and market structure. Journal of Economic Theory 156 269–316.
- 2Aliprantis and Border [2006] Aliprantis, C.D., K.C. Border. 2006. Infinite Dimensional Analysis. Berlin, Springer, 3rd ed.
- 3Bensoussan et al. [2013] Bensoussan, A., J. Frehse, P. Yam. 2013. Mean Field Games and Mean Field Type Control Theory. Springer, New York.
- 4Bertsekas and Shreve [1978] Bertsekas, D. P., S. E. Shreve. 1978. Stochastic optimal control: The discrete time case. Academic Press New York.
- 5Billingsley [1995] Billingsley, P. 1995. Probability and Measure. 3rd ed. Wiley.
- 6Billingsley [1999] Billingsley, P. 1999. Convergence of Probability Measures. 2nd ed. New York: Wiley.
- 7Biswas [2015] Biswas, A. 2015. Mean field games with ergodic cost for discrete time markov processes. ar Xiv:1510.08968.
- 8Bogachev [2007] Bogachev, V.I. 2007. Measure Theory: Volume II. Springer.
