Optimal Policies for Convex Symmetric Stochastic Dynamic Teams and their   Mean-field Limit

Sina Sanjari; Serdar Y\"uksel

arXiv:1903.11476·math.OC·November 25, 2020·SIAM J. Control. Optim.

Optimal Policies for Convex Symmetric Stochastic Dynamic Teams and their Mean-field Limit

Sina Sanjari, Serdar Y\"uksel

PDF

TL;DR

This paper investigates convex stochastic dynamic teams with symmetric information, establishing symmetry in optimal policies, convergence to mean-field limits, and linearity of optimal policies in LQG problems under certain conditions.

Contribution

It introduces exchangeable and symmetric information structures, characterizes symmetric optimal policies, and proves convergence to mean-field solutions, extending results to infinite horizon LQG teams.

Findings

01

Optimal policies exhibit symmetry in convex exchangeable teams.

02

Optimal policies for finite teams converge to mean-field optimal policies.

03

Linear optimal policies are established for symmetric LQG teams in infinite horizon settings.

Abstract

This paper studies convex stochastic dynamic team problems with finite and infinite time horizons under decentralized information structures. First, we introduce two notions called exchangeable teams and symmetric information structures. We show that in convex exchangeable team problems an optimal policy exhibits a symmetry structure. We give a characterization for such symmetrically optimal teams for a general class of convex dynamic team problems under a mild conditional independence condition. In addition, through concentration of measure arguments, we establish the convergence of optimal policies for teams with $N$ decision makers to the corresponding optimal policies for symmetric mean-field teams with infinitely many decision makers. As a by-product, we present an existence result for convex mean-field teams, where the main contribution of our paper is with respect to the…

Equations142

c ((\underline{ζ}^{σ})^{1 : N}, (\underline{u}^{σ})^{1 : N}) = c (ω_{0}, \underline{ζ}^{1 : N}, \underline{u}^{1 : N}),

c ((\underline{ζ}^{σ})^{1 : N}, (\underline{u}^{σ})^{1 : N}) = c (ω_{0}, \underline{ζ}^{1 : N}, \underline{u}^{1 : N}),

J_{T} (α \underline{γ}_{T}^{1 : N} + (1 - α) \tilde{\underline{γ}}_{T}^{1 : N}) \leq α J_{T} (\underline{γ}_{T}^{1 : N}) + (1 - α) J_{T} (\tilde{\underline{γ}}_{T}^{1 : N}) .

J_{T} (α \underline{γ}_{T}^{1 : N} + (1 - α) \tilde{\underline{γ}}_{T}^{1 : N}) \leq α J_{T} (\underline{γ}_{T}^{1 : N}) + (1 - α) J_{T} (\tilde{\underline{γ}}_{T}^{1 : N}) .

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle=\prod_{t=0}^{T-1}\prod_{i=1}^{N}{P}\bigg{(}{y}^{i}_{t}\in A^{i}\bigg{|}x_{0}^{i},{\zeta}^{i}_{0:t-1},{y}_{\downarrow t}^{\downarrow\downarrow i},{\gamma}^{\downarrow\downarrow i}_{\downarrow t}({y}^{\downarrow\downarrow i}_{\downarrow t})\bigg{)}$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle=\prod_{t=0}^{T-1}\prod_{i=1}^{N}{P}\bigg{(}{y}^{i}_{t}\in A^{i}\bigg{|}x_{0}^{i},{\zeta}^{i}_{0:t-1},{y}_{\downarrow t}^{\downarrow\downarrow i},{\gamma}^{\downarrow\downarrow i}_{\downarrow t}({y}^{\downarrow\downarrow i}_{\downarrow t})\bigg{)}$}},

J_{T} ((\underline{γ}_{T}^{σ})^{1}, \dots, (\underline{γ}_{T}^{σ})^{N})

J_{T} ((\underline{γ}_{T}^{σ})^{1}, \dots, (\underline{γ}_{T}^{σ})^{N})

= \int c (\underline{ζ}^{1 : N}, (\underline{γ}_{T}^{σ})^{1} (\underline{y}^{1}), \dots, (\underline{γ}_{T}^{σ})^{N} (\underline{y}^{N})) μ (d x_{0}^{1 : N}, d ζ_{0 : T - 1}^{1 : N})

\times t = 0 \prod T - 1 i = 1 \prod N P (d y_{t}^{i} x_{0}^{i}, ζ_{0 : t - 1}^{i}, y_{↓ t}^{↓↓ i}, (γ^{σ})_{↓ t}^{↓↓ i} (y_{↓ t}^{↓↓ i}))

= \int c ((\underline{ζ}^{σ})^{1 : N}, (\underline{γ}_{T}^{σ})^{1} ((\underline{y}^{σ})^{1}), \dots, (\underline{γ}_{T}^{σ})^{N} ((\underline{y}^{σ})^{N})) μ (d (x_{0}^{σ})^{1 : N}, d (ζ_{0 : T - 1}^{σ})^{1 : N})

\times t = 0 \prod T - 1 i = 1 \prod N P (d (y^{σ})_{t}^{i} (x_{0}^{σ})^{i}, (ζ_{0 : t - 1}^{σ})^{i}, (y^{σ})_{↓ t}^{↓↓ i}, (γ^{σ})_{↓ t}^{↓↓ i} ((y^{σ})_{↓ t}^{↓↓ i}))

= \int c (\underline{ζ}^{1 : N}, \underline{γ}_{T}^{1} (\underline{y}^{1}), \dots, \underline{γ}_{T}^{N} (\underline{y}^{N})) μ (d x_{0}^{1 : N}, d ζ_{0 : T - 1}^{1 : N})

\times t = 0 \prod T - 1 i = 1 \prod N P (d y_{t}^{i} x_{0}^{i}, ζ_{0 : t - 1}^{i}, y_{↓ t}^{↓↓ i}, γ_{↓ t}^{↓↓ i} (y_{↓ t}^{↓↓ i}))

= J_{T} (\underline{γ}_{T}^{1}, \dots, \underline{γ}_{T}^{2}),

J_{T} (\underline{\tilde{γ}}_{T})

J_{T} (\underline{\tilde{γ}}_{T})

x_{t + 1}^{i} = f_{t} (x_{t}^{i}, u_{t}^{i}, w_{t}^{i}),

x_{t + 1}^{i} = f_{t} (x_{t}^{i}, u_{t}^{i}, w_{t}^{i}),

y_{t}^{i} = h_{t} (x_{0 : t}^{i}, u_{0 : t - 1}^{i}, v_{0 : t}^{i}),

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle J_{T}^{N}(\underline{\gamma}_{T}^{N})=\frac{1}{N}\sum_{t=0}^{T-1}\sum_{i=1}^{N}E^{\underline{\gamma}_{T}^{1:N}}\left[c\left(\omega_{0},x_{t}^{i},u_{t}^{i},\frac{1}{N}\sum_{p=1}^{N}u_{t}^{p},\frac{1}{N}\sum_{p=1}^{N}x_{t}^{p}\right)\right]$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle J_{T}^{N}(\underline{\gamma}_{T}^{N})=\frac{1}{N}\sum_{t=0}^{T-1}\sum_{i=1}^{N}E^{\underline{\gamma}_{T}^{1:N}}\left[c\left(\omega_{0},x_{t}^{i},u_{t}^{i},\frac{1}{N}\sum_{p=1}^{N}u_{t}^{p},\frac{1}{N}\sum_{p=1}^{N}x_{t}^{p}\right)\right]$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle J_{T}^{\infty}(\underline{\gamma}_{T})=\limsup\limits_{N\rightarrow\infty}J_{T}^{N}(\underline{\gamma}_{T})$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle J_{T}^{\infty}(\underline{\gamma}_{T})=\limsup\limits_{N\rightarrow\infty}J_{T}^{N}(\underline{\gamma}_{T})$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle Q_{N}(B):=\frac{1}{N}\sum_{i=1}^{N}\delta_{\beta_{N}^{i}}(B)\>\>\>\>\>\>\text{where}\>\>\>\>\>\>\beta_{N}^{i}:=(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle Q_{N}(B):=\frac{1}{N}\sum_{i=1}^{N}\delta_{\beta_{N}^{i}}(B)\>\>\>\>\>\>\text{where}\>\>\>\>\>\>\beta_{N}^{i}:=(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle\tilde{Q}_{N}(B):=\frac{1}{N}\sum_{i=1}^{N}\delta_{{\beta}_{\infty}^{i}}(B)\>\>\>\>\>\ \text{where}\>\>\>\>\>\>{\beta}_{\infty}^{i}:=(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$}},

N \to \infty lim sup J_{T}^{N} (\underline{\tilde{γ}}_{T}^{*, N}) = J_{T}^{\infty} (\underline{\tilde{γ}}_{T}^{*, \infty}),

N \to \infty lim sup J_{T}^{N} (\underline{\tilde{γ}}_{T}^{*, N}) = J_{T}^{\infty} (\underline{\tilde{γ}}_{T}^{*, \infty}),

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle{{P}\left(\bigg{\{}\omega\in\Omega\bigg{|}\lim\limits_{N\rightarrow\infty}\left(\frac{1}{N}\sum_{i=1}^{N}\left[g(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})-g(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})\right]\right)=0\bigg{\}}\middle|\omega_{0}\right)=1}$}}.

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle{{P}\left(\bigg{\{}\omega\in\Omega\bigg{|}\lim\limits_{N\rightarrow\infty}\left(\frac{1}{N}\sum_{i=1}^{N}\left[g(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})-g(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})\right]\right)=0\bigg{\}}\middle|\omega_{0}\right)=1}$}}.

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle=\epsilon^{-1}E\left[\lim\limits_{N\rightarrow\infty}\left|g(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})-g(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})\right|\bigg{|}\omega_{0}\right]=0$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle=\epsilon^{-1}E\left[\lim\limits_{N\rightarrow\infty}\left|g(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})-g(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})\right|\bigg{|}\omega_{0}\right]=0$}},

\displaystyle\scalebox{0.93}{\mbox{$\displaystyle{P}\left(\bigg{\{}\omega\in\Omega\bigg{|}\lim\limits_{N\rightarrow\infty}\left(\frac{1}{N}\sum_{i=1}^{N}g\bigg{(}\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i}\bigg{)}-E\bigg{[}g(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{1}),\underline{y}^{1},\underline{\zeta}^{1})\bigg{|}\omega_{0}\bigg{]}\right)=0\bigg{\}}\middle|\omega_{0}\right)=1$}},

\displaystyle\scalebox{0.93}{\mbox{$\displaystyle{P}\left(\bigg{\{}\omega\in\Omega\bigg{|}\lim\limits_{N\rightarrow\infty}\left(\frac{1}{N}\sum_{i=1}^{N}g\bigg{(}\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i}\bigg{)}-E\bigg{[}g(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{1}),\underline{y}^{1},\underline{\zeta}^{1})\bigg{|}\omega_{0}\bigg{]}\right)=0\bigg{\}}\middle|\omega_{0}\right)=1$}},

N \to \infty lim \int g d Q_{N} - \int g d Q

N \to \infty lim \int g d Q_{N} - \int g d Q

x_{t}^{i}

x_{t}^{i}

y_{t}^{i}

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle=\frac{1}{N}\sum_{i=1}^{N}E\left[\tilde{c}\left(\omega_{0},\underline{\zeta}^{i},\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\frac{1}{N}\sum_{i=1}^{N}\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\frac{1}{N}\sum_{i=1}^{N}\Lambda(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{\zeta}^{i})\right)\right]$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle=\frac{1}{N}\sum_{i=1}^{N}E\left[\tilde{c}\left(\omega_{0},\underline{\zeta}^{i},\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\frac{1}{N}\sum_{i=1}^{N}\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\frac{1}{N}\sum_{i=1}^{N}\Lambda(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{\zeta}^{i})\right)\right]$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle\geq E\bigg{[}E\bigg{[}\liminf\limits_{N\rightarrow\infty}$}}{\int_{\cal{Z}}\tilde{c}\left(\omega_{0},\zeta,u,\int_{{\bf{U}}}uQ_{N}(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q_{N}(du\times{\bf{Y}}\times d\zeta)\right)}

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle\geq E\bigg{[}E\bigg{[}\liminf\limits_{N\rightarrow\infty}$}}{\int_{\cal{Z}}\tilde{c}\left(\omega_{0},\zeta,u,\int_{{\bf{U}}}uQ_{N}(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q_{N}(du\times{\bf{Y}}\times d\zeta)\right)}

\displaystyle\scalebox{0.97}{\mbox{$\displaystyle\geq E\bigg{[}E\bigg{[}\int_{\cal{Z}}\tilde{c}\left(\omega_{0},\zeta,u,\int_{{\bf{U}}}uQ(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q(du\times{\bf{Y}}\times d\zeta)\right)$}}\scalebox{1.0}{\mbox{$\displaystyle Q(du,dy,d\zeta)\bigg{|}\omega_{0}\bigg{]}\bigg{]}$}},

\scalebox{1.0}{\mbox{$\displaystyle\frac{1}{N}\sum_{i=1}^{N}\underline{\gamma}^{*,N}_{T}(\underline{y}^{i})=\int_{{\bf{U}}}uQ_{N}(du\times{\bf{Y}}\times{\bf{S}})\xrightarrow{{N\to\infty}}\int_{{\bf{U}}}uQ(du\times{\bf{Y}}\times{\bf{S}})=E[\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{1})|\omega_{0}]$}}.

\scalebox{1.0}{\mbox{$\displaystyle\frac{1}{N}\sum_{i=1}^{N}\underline{\gamma}^{*,N}_{T}(\underline{y}^{i})=\int_{{\bf{U}}}uQ_{N}(du\times{\bf{Y}}\times{\bf{S}})\xrightarrow{{N\to\infty}}\int_{{\bf{U}}}uQ(du\times{\bf{Y}}\times{\bf{S}})=E[\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{1})|\omega_{0}]$}}.

\scalebox{1.0}{\mbox{$\displaystyle\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q_{N}(du\times{\bf{Y}}\times d\zeta)\xrightarrow{{N\to\infty}}\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q(du\times{\bf{Y}}\times d\zeta)$}}.

\scalebox{1.0}{\mbox{$\displaystyle\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q_{N}(du\times{\bf{Y}}\times d\zeta)\xrightarrow{{N\to\infty}}\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q(du\times{\bf{Y}}\times d\zeta)$}}.

\displaystyle~{}~{}~{}\scalebox{1.0}{\mbox{$\displaystyle=\tilde{c}\left(\omega_{0},\zeta,u,\int_{\bf U}uQ(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q(du\times{\bf{Y}}\times d\zeta)\right)$}}.

\displaystyle~{}~{}~{}\scalebox{1.0}{\mbox{$\displaystyle=\tilde{c}\left(\omega_{0},\zeta,u,\int_{\bf U}uQ(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q(du\times{\bf{Y}}\times d\zeta)\right)$}}.

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle G_{N}^{M}:=\min\bigg{\{}M,$}}{\tilde{c}\left(\omega_{0},\zeta,u,\int_{\bf U}uQ_{N}(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q_{N}(du\times{\bf{Y}}\times d\zeta)\right)\bigg{\}}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle G_{N}^{M}:=\min\bigg{\{}M,$}}{\tilde{c}\left(\omega_{0},\zeta,u,\int_{\bf U}uQ_{N}(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q_{N}(du\times{\bf{Y}}\times d\zeta)\right)\bigg{\}}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle G^{M}:=\min\bigg{\{}M,$}}{\tilde{c}\left(\omega_{0},\zeta,u,\int_{\bf U}uQ(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q(du\times{\bf{Y}}\times d\zeta)\right)\bigg{\}}},

\scalebox{1.0}{\mbox{$\displaystyle G:=\tilde{c}\left(\omega_{0},\zeta,u,\int_{\bf U}uQ(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q(du\times{\bf{Y}}\times d\zeta)\right)$}}.

\scalebox{1.0}{\mbox{$\displaystyle G:=\tilde{c}\left(\omega_{0},\zeta,u,\int_{\bf U}uQ(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q(du\times{\bf{Y}}\times d\zeta)\right)$}}.

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle=\int_{\cal{Z}}\tilde{c}\left(\omega_{0},\zeta,u,\int_{\bf U}uQ(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q(du\times{\bf{Y}}\times d\zeta)\right)$}}\scalebox{1.0}{\mbox{$\displaystyle Q(du,dy,d\zeta)$}},

\displaystyle\scalebox{1.0}{\mbox{$\displaystyle=\int_{\cal{Z}}\tilde{c}\left(\omega_{0},\zeta,u,\int_{\bf U}uQ(du\times{\bf{Y}}\times{\bf{S}}),\int_{{\bf{U}}\times{\bf{S}}}\Lambda Q(du\times{\bf{Y}}\times d\zeta)\right)$}}\scalebox{1.0}{\mbox{$\displaystyle Q(du,dy,d\zeta)$}},

\tilde{c} (ω_{0}, ζ, u, \int_{U} u Q_{N} (d u \times Y \times S), \int_{U \times S} Λ Q_{N} (d u \times Y \times d ζ)) \geq G_{N}^{M} .

\tilde{c} (ω_{0}, ζ, u, \int_{U} u Q_{N} (d u \times Y \times S), \int_{U \times S} Λ Q_{N} (d u \times Y \times d ζ)) \geq G_{N}^{M} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Optimal Policies for Convex Symmetric Stochastic Dynamic Teams and their Mean-field Limit

††thanks: This research was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada. A summary of some of the results here is included in [37] presented at the 2019 IEEE Conference on Decision and Control (CDC). The authors are with the Department of Mathematics and Statistics, Queen’s University, Kingston, ON, Canada. Email: {16sss3,[email protected]}

Sina Sanjari and Serdar Yüksel

Abstract

This paper studies convex stochastic dynamic team problems with finite and infinite time horizons under decentralized information structures. First, we introduce two notions called exchangeable teams and symmetric information structures. We show that in convex exchangeable team problems an optimal policy exhibits a symmetry structure. We give a characterization for such symmetrically optimal teams for a general class of convex dynamic team problems under a mild conditional independence condition. In addition, through concentration of measure arguments, we establish the convergence of optimal policies for teams with $N$ decision makers to the corresponding optimal policies for symmetric mean-field teams with infinitely many decision makers. As a by-product, we present an existence result for convex mean-field teams, where the main contribution of our paper is with respect to the information structure in the system when compared with the related results in the literature that have either assumed a classical information structure or a static information structure. We also apply these results to the important special case of Linear Quadratic Gaussian (LQG) team problems, where while for partially nested LQG team problems with finite time horizons it is known that the optimal policies are linear, for infinite horizon problems the linearity of optimal policies has not been established in full generality. We also study average cost finite and infinite horizon dynamic team problems with a symmetric partially nested information structure and obtain globally optimal solutions where we establish linearity of optimal policies.

keywords:

Stochastic teams, average cost optimization, decentralized control, mean-field teams

1 Introduction and literature review

Team problems consist of a collection of decision makers or agents acting together to optimize a common cost function, but not necessarily sharing all the available information. The term stochastic teams refers to the class of team problems where there exist randomness in the initial states, observations, cost realizations, or the evolution of the dynamics. At each time stage, each agent only has partial access to the global information which is defined by the information structure (IS) of the problem [45]. If there is a pre-defined order in which the decision makers act then the team is called a sequential team. For sequential teams, if each agent’s information depends only on primitive random variables, the team is static. If at least one agent’s information is affected by an action of another agent, the team is said to be dynamic. Information structures can be further categorized as classical, partially nested, and non-classical. An IS is classical if the information of decision maker $i$ (DMi) contains all of the information available to DMk for $k<i$ . An IS is partially nested, if whenever the action of DMk, for some $k<i$ , affects the information of DMi, then the information of DMi contains the information of DMk. An IS which is not partially nested is non-classical. A detailed review is presented in [49].

Obtaining structural results in team problems is important towards establishing both existence and computational/approximation methods for optimal policies. In this paper, we define the notion of exchangeable teams and symmetric information structures, and we show that, for convex exchangeable dynamic teams with finite horizons, optimal policies exhibit a symmetry structure (Theorem 7). For any number of DMs, this symmetry structure is more relaxed when compared with the symmetry results developed earlier, e.g. in [38, 36] which focused on problems under a static information structure, and is applicable for dynamic teams which may not admit a static reduction, as long as convexity in policies holds for the team problem.

There have been many studies involving decentralized stochastic control with infinitely many decision makers. In particular, when the coupling among the decision makers is only through some aggregate/average effect, such problems can be viewed within the umbrella of mean-field games [26, 21], which were introduced as a limit model for non-cooperative symmetric $N$ -player differential games with a mean-field interaction as $N\to\infty$ . The solution concept in game theory is often Nash equilibrium, and often under various characterizations of it in dynamic Bayesian setups. In the context of decentralized stochastic control or teams, these would correspond to person-by-person optimal solutions, and hence not necessarily globally optimal solutions.

Nonetheless, on the existence as well as uniqueness and non-uniqueness results on equilibria, there have been several studies for mean-field games [26, 5, 14, 28, 22, 7, 17]. There have also been several studies for mean-field games where the limits of sequences of Nash equilibria have been investigated as the number of decision makers $N\to\infty$ (see e.g., [15, 25, 6, 26, 4]). We refer interested readers to [13, 11] for a literature review and a detailed summary of some recent results on mean-field games.

Some notable relevant studies from the mean-field literature are the following: In [15], through a concentration of measures argument, it has been shown that sequences of $\epsilon_{N}$ - local (for each player) Nash equilibria for $N$ player games converge to a solution for the mean-field game under exchangeability of the initial states and weak convergence of normalized occupational measures to a deterministic measure [15, Theorem 5.1]. In [23], assumptions on equilibrium policies of the large population mean-field symmetric stochastic differential games have been presented to allow the convergence of asymmetric approximate Nash equilibria to a weak solution of the mean-field game [23, Theorem 2.6].

However, in these studies the information structures are restricted to the following models: In [15] the information structure is assumed to be static since strategies of each player are assumed to be adapted to the filtration generated by his/her initial states and Wiener process (also called distributed open-loop controllers in the mean-field games’ literature [23, 14, 13]) (see Remark 2 for details of this discussion). Convergence of Nash equilibria induced by closed-loop controllers to a weak semi-Markov mean-field equilibrium has been established in [25] for finite horizon mean-field game problems, where the classical information structure (i.e., what would be a centralized problem in the team theoretic setup) has been considered. For infinite horizon problems, in [12], an example of ergodic differential games with mean-field coupling has been constructed such that limits of sequences of expected costs induced by symmetric Nash-equilibrium policies of $N$ -player games capture expected costs induced by many more Nash-equilibrium policies including a mean-field equilibrium and social optima. In [25], the classical information structure (a centralized problem) has been considered, where in [12] it has been assumed that players have access to all the history of states of all players but not controls (we note that in the team problem setup through using a classical result of Blackwell [9] in the case where each DM knows all the history of states of all DMs, optimal policies can be realized as one in the centralized problem where just the global state is a sufficient statistic). Moreover, under relaxed regularity conditions on dynamics and the cost function, a limit theory has been established for controlled McKean-Vlasov dynamics [24] under the classical information structure, where through a similar analysis as in [15, 23], it has been shown that the empirical measure of pairs of states and $\epsilon_{N}$ -open-loop optimal controls converges weakly as $N\to\infty$ to limit points in the set of pairs of states and optimal controls of the McKean-Vlasov problem.

The above highlights the intricacies due to the information structure aspects: different from the aforementioned studies above, we consider information structures that are not necessarily static or classical. Also, in this paper, we work with global optimality and not only mean-field equilibria and we show the existence of a globally optimal policy for mean-field team problems. On the other hand, in our paper since we work under the convexity assumption, the information structure does not allow for the mean-field coupling in the dynamics. We also note that in prior work, [38], we studied static teams where under convexity and more restrictive symmetry conditions, global optimality of a limit policy of a sequence of $N$ -DM optimal policies has been established.

In the context of stochastic teams with countably infinite number of decision makers, the gap between person-by-person optimality (Nash equilibrium in the game-theoretic context) and global team optimality is significant since a perturbation of finitely many policies fails to deviate the value of the expected cost, thus person-by-person optimality is a weak condition for such a setup, and hence the results presented in the aforementioned papers may be inconclusive regarding global optimality of the limit equilibrium. For teams and social optima control problems, the analysis has primarily focused on the LQG model or Markov chains where the centralized performance has been shown to be achieved asymptotically by decentralized controllers (see e.g., [20, 2, 3]).

We also obtain existence results on optimal policies for the setups considered. Compared to the results on the existence of a globally optimal policy in team problems where (finite) $N$ -DM team problems have been considered [47, 16, 50, 34], we study convex team problems with countably infinite number of decision makers.

Parts of our results in this paper correspond to LQG teams. In [18], it has been shown that for teams with finite number of DMs, dynamic teams with a partially nested information structure can be reduced to a static one ([18, 46]) where Radner’s theorem concludes global optimality of linear policies for LQG team problems [31]. However, for average cost infinite horizon, partially nested, LQG dynamic team problems so far there has been no universal result establishing that a globally optimal policy is linear, time-invariant, and stabilizing, and this has been often imposed apriori: In [33], the problem of designing a linear, time-invariant, stabilizing, state feedback optimal policy for decentralized $\mathcal{H}_{2}$ -optimization problems, which satisfy the quadratically invariance property, has been addressed by reparametrizing the problem as a convex problem (via Youla parameterization). In [32], it has been shown that for sequential team problems involving linear systems, quadratic invariance and the partially nested property are equivalent. For a class of partially ordered (POSET) systems, state space techniques have been utilized to obtain optimal, linear, time-invariant, state feedback controllers for $\mathcal{H}_{2}$ -optimization problems with sparsity constraints [41]. A similar result has been established in [42] where linearity and time invariance have been imposed apriori. In [27], $\mathcal{H}_{2}$ -optimization output feedback problems with two-players have been considered and optimality results have been established when the optimal policies are restricted to linear, time invariant, stabilizing policies. However, the results in [27, 33, 41, 42] are inconclusive regarding global optimality. Our contribution here is to consider average cost infinite horizon dynamic team problems without restricting the set of policies to those that are linear, time-invariant, and stabilizing unlike the results in [27, 33, 41, 42]. We note again that the optimality of linear policies for infinite horizon LQG problems is an open problem in its generality and we provide positive results for a class of such problems.

Contributions. In view of the discussion above, our paper makes the following contributions.

(i)

We define a notion of exchangeable teams and symmetric information structures, and we show that, for convex exchangeable dynamic teams with finite horizons, optimal policies exhibit a symmetry structure (Theorem 7). For any number of DMs, this symmetry structure is more relaxed when compared with the symmetry results developed in [38, 36] and is applicable for dynamic teams which may not admit a static reduction, as long as convexity in policies holds for the team problem.

(ii)

For convex mean-field teams with a symmetric information structure, through concentration of measure arguments, we establish the convergence of optimal policies for mean-field teams with $N$ decision makers to the corresponding optimal policies for mean-field teams (see Theorem 10).

(iii)

We establish an existence result for the class of convex mean-field teams with a symmetric information structure (see Theorem 12) for finite horizon problems, where, as noted in the literature review, related results assumed more restrictive information structures which are either static or classical.

(iv)

We also apply our results to LQG dynamic teams for finite horizon problems (see Section 4). For LQG dynamic teams with a symmetric partially information pattern, we obtain an optimal policy for finite horizon problems (see Section 4.1). We also apply convex mean-field results to LQG mean-field teams with a symmetric partially nested information structure (see Section 4.1) and obtain a globally optimal policy. Building on the result above, we also obtain a globally optimal policy for average cost LQG team problems.

The organization of the paper is as follows: we study convex exchangeable dynamic teams with finite horizons in Section 2, and we study mean-field teams in Section 3. We obtain globally optimal solutions for finite horizon problems with a symmetric partially nested information structure and LQG mean-field teams in Section 4.1, and we discuss average cost LQG team problems with a symmetric information structure in Section 4.2, respectively.

Notation. $\mathbb{R}$ and $\mathbb{N}$ denote the set of real numbers and natural numbers, respectively. We denote trace of a matrix $A$ as $Tr(A)$ . We denote that a random vector $X$ is independent of a random vector $Y$ by $X\raisebox{0.50003pt}{\rotatebox[origin={c}]{90.0}{$ \models $}}Y$ . We denote $A^{T}$ as the transpose of a matrix $A$ and $A^{(T)}$ to show the dependence of a matrix $A$ to $T\in\mathbb{N}$ . For any random variables $z^{1:N}:=(z^{1},\dots,z^{N})$ , we defined ${z}^{-i}:=({z}^{1},\dots,{z}^{i-1},{z}^{i+1},\dots,{z}^{N})$ , and $\mathcal{M}_{r,q}$ denotes the space of $r\times q$ matrices.

1.1 Preliminaries

In this section, we introduce Witsenhausen’s Intrinsic Model for sequential teams [45] (we generalize this definition to infinite number of decision makers). Consider sequential systems and assume the action and measurement spaces are standard Borel spaces, that is, Borel subsets of complete, separable and metric spaces. The Intrinsic Model for sequential teams is defined as follows.

•

There exists a collection of measurable spaces $\{(\Omega,{\cal F}),\allowbreak(\mathbb{U}^{i},{\cal U}^{i}),(\mathbb{Y}^{i},{\cal Y}^{i}),i\in{\mathcal{N}}\}$ , specifying the system’s distinguishable events, and control and measurement spaces. The set $\mathcal{N}$ denotes the collection of decision makers. The set $\mathcal{N}$ can be a finite set $\{1,2,\dots,N\}$ or a countable set $\mathbb{N}$ . The pair $(\Omega,{\cal F})$ is a measurable space (on which an underlying probability may be defined). The pair $(\mathbb{U}^{i},{\cal U}^{i})$ denotes the Borel space from which the action $u^{i}$ of DMi is selected. The pair $(\mathbb{Y}^{i},{\cal Y}^{i})$ denotes the Borel observation/measurement space.

•

There is a measurement constraint to establish the connection between the observation variables and the system’s distinguishable events. The $\mathbb{Y}^{i}$ -valued observation variables are given by $y^{i}=h^{i}(\omega,{\underline{u}}^{1:i-1})$ , where ${\underline{u}}^{1:i-1}=\{u^{k},k\leq i-1\}$ and $h^{i}$ s are measurable functions.

•

The set of admissible control laws $\underline{\gamma}=\{\gamma^{i}\}_{i\in\mathcal{N}}$ , also called designs or policies, are measurable control functions, so that $u^{i}=\gamma^{i}(y^{i})$ . Let $\Gamma^{i}$ denote the set of all admissible policies for DMi and let ${\Gamma}=\prod_{i\in\mathcal{N}}\Gamma^{i}$ .

•

There is a probability measure ${P}$ on $(\Omega,{\cal F})$ describing the probability space on which the system is defined.

Under the intrinsic model, every DM acts separately. However, depending on the information structure, it may be convenient to consider a collection of DMs as a single DM acting at different time instances. In fact, in the classical stochastic control, this is the standard approach.

2 Finite horizon convex dynamic team problems with a symmetric information structure

In this section, we characterize symmetry in dynamic team problems. According to the discussion above, by considering a collection of DMs as a single DM ( $i=1,\dots,N$ ) acting at different time instances ( $t=0,\dots,T-1$ ), we define a team problem with $(NT)$ -DMs as a team with $N$ -DMs:

(i)

Let the observation and action spaces be Borel subsets of $\mathbb{R}^{n}$ for a positive integer $n$ and be identical for each DM ( $i=1,\dots,N$ ) with $\mathbb{Y}_{i}:={\bf{Y}}=\prod_{t=0}^{T-1}\mathbb{Y}^{t}$ , $\mathbb{U}_{i}:={\bf{U}}=\prod_{t=0}^{T-1}\mathbb{U}^{t}$ , respectively. The sets of all admissible policies are denoted by ${\bf{\Gamma}}=\prod_{i=1}^{N}\Gamma_{i}=\prod_{i=1}^{N}\prod_{t=0}^{T-1}\Gamma^{t}$ .

(ii)

For $i=1,\dots,N$ , $y^{i}_{t}:=h_{t}^{i}(x_{0}^{1:N},\zeta^{1:N}_{0:t},u_{0:t-1}^{1:N})$ represents the observation of DMi at time $t$ ( $h_{t}^{i}$ s are Borel measurable functions).

(iii)

Let $(\underline{\zeta}^{1:N}):=(\underline{\zeta}^{1},\dots,\underline{\zeta}^{N})$ where $\underline{\zeta}^{i}:=(x_{0}^{i},\zeta_{0:T-1}^{i})$ denotes all the uncertainty associated with DMi including his/her initial states. We assume that $(\underline{\zeta}^{i})$ takes values in $\Omega_{\zeta}$ (where at each time instances $t$ , it takes value in $\Omega_{\zeta_{t}}$ ). Let $\mu$ denote the law of $\underline{\zeta}^{1:N}$ .

(iv)

Define the expected cost function of $\underline{\gamma}^{1:N}$ as $J_{N}(\underline{\gamma}^{1:N})=E^{\underline{\gamma}^{1:N}}[c(\underline{\zeta}^{1:N},\underline{u}^{1:N})]$ , for some Borel measurable cost function $c:\prod_{i=1}^{N}({\Omega_{\zeta}}\times{\bf{U}})\to\mathbb{R}_{+}$ , where $\underline{\gamma}^{1:N}=(\underline{\gamma}^{1},\underline{\gamma}^{2},\dots,\underline{\gamma}^{N})$ and $\underline{\gamma}^{i}=\gamma^{i}_{0:T-1}$ for $i=1,\dots,N$ .

Now, we present the definition of symmetric information structures (note that symmetric information structures can be classical, partially nested, or non-classical).

Definition 1.

Let the information of DMi acting at time $t$ be described as $I_{t}^{i}:=\{y_{t}^{i}\}$ . The information structure of a sequential $N$ -DM team problem is symmetric if

(i)

$y^{i}_{t}=h_{t}(x_{0}^{i},x_{0}^{-i},\zeta^{i}_{0:t},{\zeta}^{-i}_{0:t},u_{0:t-1}^{i},u_{0:t-1}^{-i})$ * where $h_{t}$ is identical for all $i=1,\dots,N$ (note that function’s arguments depend on $i$ ).*

We note that the above definition can be generalized to be applicable for teams with countably infinite DMs and infinite horizon problems.

The symmetric information structure can also be interpreted and defined as a graph, which has often been the common method to describe information structures in control theory, relating DMs and their information through directed edges. Consider $G(V,\mu)$ as a directed graph with $V=\{1,\dots,NT\}$ nodes and where $\mu\subset V\times V$ determines the directed edges between nodes; this represents the dependency notation in the information of nodes, i.e., $(i,j)$ denotes a directed edge from $i$ to $j$ , $i\to j$ , it means $u^{i}$ affects $y^{j}$ through the relation $y^{i}=h^{i}(\omega,{\underline{u}}^{1:i-1})$ defined in the intrinsic model (see Section 1.1). We denote by $\downarrow j$ as the set of nodes $i$ such that $i\to j$ (ancestors), and $\downarrow\downarrow j=\{\downarrow j\}\cup\{j\}$ . Similarly, we can define descendants by $\uparrow j$ . We can define a collection of DMs as a single DM ( $i=1,\dots,N$ ) acting at different time instances ( $t=0,\dots,T-1$ ) on a graph with a symmetric information structure (two examples are shown in Fig. 2.1, and Fig. 2.2). Assume

(i)

there exists a node $\{i\}$ (root node), $\omega_{0}$ . Each sub-graph represents a single DM acting at time instances $t=0,\dots,T-1$ , and there exists a finite number of sub-graphs $G_{p}(\hat{V},\hat{\mu})$ such that $\cup_{p=1}^{N}G_{p}\cup\{i\}=G$ , where $G_{p}$ s are isomorphic (see e.g., [43]) for all $p=1,\dots,N$ , i.e., for every node with directed edges in each sub-graph there exists a unique node with identical directed edges in the corresponding sub-graphs, where $\hat{V}=\{0,\dots,T-1\}$ , and $G_{p}^{k}$ refers to a node $k$ in $G_{p}$ for all $p=1,\dots,N$ and $k=0,\dots,T-1$ ,

(ii)

sharing of the information is symmetric across sub-graphs, i.e., for $p,s=1,\dots,N$ , and $k,j=0,\dots,T-1$ , and for every edge from a node $G_{p}^{k}$ to a node $G_{s}^{j}$ , there exists an edge from a node $G_{p}^{k}$ to nodes $G_{-p}^{j}$ , where $G_{-p}^{j}$ denotes $(G_{1}^{j},\dots,G_{p-1}^{j},G_{p+1}^{j},\dots,G_{N}^{j})$ , and also there exist edges from nodes $G_{-p}^{k}$ to a node $G_{p}^{j}$ .

Now, we present an exchangeability hypothesis on the cost function. First, we recall the definition of an exchangeable finite set of random variables.

Definition 2.

Random variables $(x^{1},x^{2},\dots,x^{N})$ defined on a common probability space are exchangeable if for any permutation $\sigma$ of the set $\{1,\dots,N\}$ (a mapping $\sigma:\{1,\ldots,N\}\to\{1,\ldots,N\}$ ),

[TABLE]

for any measurable $\{A^{1},\dots,A^{N}\}$ and $(x^{\sigma})^{i}:=x^{\sigma(i)}$ for all $i\in\{1,\dots,N\}$ .

Assumption 2.1.

For any permutation $\sigma$ of the set $\{1,\dots,N\}$ , we have for all $\omega_{0}$

[TABLE]

where $(\underline{\zeta}^{\sigma})^{1:N}=(\underline{\zeta}^{\sigma(1)},\dots,\underline{\zeta}^{\sigma(N)})$ and $(\underline{u}^{\sigma})^{1:N}=(\underline{u}^{\sigma(1)},\dots,\underline{u}^{\sigma(N)})$ .

Here, we recall some definitions and results from [50, Section 3.3] on convexity of static and dynamic team problems required to follow the result in this paper.

Definition 3.

[50, Section 3.3]** An $N$ -DM team problem (static or dynamic) is convex in policies if for any two team policies ${\underline{\gamma}}_{T}^{1:N}$ and $\tilde{\underline{\gamma}}_{T}^{1:N}$ in the set $\{\underline{\gamma}_{T}^{1:N}\in{\bf\Gamma}:J(\underline{\gamma}_{T}^{1:N})<\infty\},$ and for any $\alpha\in(0,1)$ , we have

[TABLE]

The above definition can also be applied to infinite-horizon and/or teams with countably infinite number of DMs. We recall sufficient conditions for convexity of static and dynamic team problems following [50, Section 3.3].

Theorem 4.

[50, Section 3.3]** Consider a sequential team problems, and assume action spaces are convex, and $J(\underline{\gamma})<\infty$ for all $\underline{\gamma}\in{\bf\Gamma}$ (or alternatively, restrict the set to those leading to the finite cost). Then

(i)

for static team problems convexity of the cost function in actions is sufficient for convexity of the team problem in policies,

(ii)

for dynamic team problems with a static reduction, convexity of the team problem in policies is equivalent to the convexity of its static reduction.

(iii)

in particular, for partially nested dynamic teams with a static reduction (more generally, for stochastically partially nested team problems **[50, Section 3.3]**) if the cost function is convex in actions then for the reduced team problem with an equivalent information structure is convex on ${\bf\Gamma}$ .

The conditions above, however, are only sufficient conditions [50, Example 1]. We note however that as a Corollary for (ii) above, for the LQG setup, under partial nestedness, convexity in policies hold as a consequence of Radner’s theorem; we will study this case in Section 4. On the other hand, not all LQG problems are convex: the celebrated counterexample of Witsenhausen [44] demonstrates that under non-classical information structures, even LQG problems may not be convex and optimal policies may not be linear.

2.1 Optimality of symmetric policies for convex dynamic teams with a symmetric information structure

In the following, we define notions of exchangeable and symmetrically optimal teams analogous to [38, 36] for dynamic teams.

Definition 5.

*(Exchangeable teams)

An $N$ -DM team is exchangeable if the value of the expected cost function is invariant under every permutation of policies of DMs, i.e., $J_{T}(\underline{\gamma}_{T}^{1},\underline{\gamma}_{T}^{2},\dots,\underline{\gamma}_{T}^{N})=J_{T}((\underline{\gamma}_{T}^{\sigma})^{1},\dots,(\underline{\gamma}_{T}^{\sigma})^{N})$ .*

Definition 6.

*(Symmetrically optimal teams)

A team is symmetrically optimal, if for every given policy $\underline{\gamma}_{T}=(\underline{\gamma}_{T}^{1},\dots,\underline{\gamma}_{T}^{N})$ , there exists an identically symmetric policy (i.e., each DM has the same policy, $\underline{\tilde{\gamma}}_{T}=(\underline{\tilde{\gamma}}_{T}^{1},\dots,\underline{\tilde{\gamma}}_{T}^{N})$ , and $\underline{\tilde{\gamma}}_{T}^{i}=\underline{\tilde{\gamma}}_{T}^{j}$ for all $i,j=1,\dots,N$ ) which performs at least as good as the given policy.*

Remark 1.

The concepts of exchangeable and symmetrically optimal dynamic teams in this paper are generalizations of those for static teams in [38, 36]. However, here, the value of the expected cost function may not be invariant under exchanging $\gamma^{i}_{t}$ with $\gamma^{j}_{k}$ for $k\not=t$ , $k,t=0,\dots,T-1$ , and for $i,j=1,\dots,N$ .

Here, we give a characterization for exchangeable and symmetrically optimal dynamic teams.

Theorem 7.

Consider dynamic team problems with a symmetric information structure under Assumption 2.1. If

(a)

action spaces $\mathbb{U}_{t}$ are convex for all $t=0,\dots,T-1$ ,

(b)

$(\underline{\zeta}^{1},\dots,\underline{\zeta}^{N})$ * are exchangeable,*

(c)

for all policies $\gamma\in\bf{\Gamma}$ , and for all $A=A^{1}\times\dots\times A^{N}$ where $A^{i}\in\mathcal{Y}^{i}$ ,

[TABLE]

where $y_{\downarrow t}^{\downarrow\downarrow i}:=\{y_{p}^{j}|u^{j}_{p}~{}\text{affects}~{}y^{i}_{t}~{}\forall~{}p=0,\dots,t-1~{}\text{and}~{}\forall j=1,\dots,N\}$ and $({\gamma}^{\downarrow\downarrow i}_{\downarrow t}({y}^{\downarrow\downarrow i}_{\downarrow t}))$ can be defined similarly,

(i)

then, the team problem is exchangeable.

(ii)

Furthermore, if the team problem is convex in policies (see Theorem 4), then the team is symmetrically optimal.

Proof.

We first show that for any permutation $\sigma\in S$ , $J_{T}((\underline{\gamma}_{T}^{\sigma})^{1},\dots,(\underline{\gamma}_{T}^{\sigma})^{N})=J_{T}(\underline{\gamma}_{T}^{1},\dots,\underline{\gamma}_{T}^{N})$ , i.e., the team is exchangeable. We have,

[TABLE]

where (3) follows from condition (c). Equality (4) follows from exchanging $\underline{y}^{i}$ , $\underline{\zeta}^{i}$ with $(\underline{y}^{\sigma})^{i}$ , $(\underline{\zeta}^{\sigma})^{i}$ by relabeling them, respectively. Since the information structure is symmetric, (1) and condition (b) imply (5). Hence, the team is exchangeable. Let $\underline{\gamma}^{*}_{T}=(\underline{\gamma}_{T}^{1*},\dots,\underline{\gamma}_{T}^{N*})$ be a given policy. Consider $\underline{\tilde{\gamma}}_{T}$ as a convex combination of all possible permutations of policies by averaging them. Since action spaces are convex by condition (a), $\underline{\tilde{\gamma}}_{T}$ is a control policy. Following from convexity of the cost function in policies, we have

[TABLE]

where $|S|$ denotes the cardinality of the set $S$ and the inequality above follows from the hypothesis that the team problem is convex on ${\bf\Gamma}$ and the last equality follows from exchangeability of the team problem. This implies that the team is symmetrically optimal and completes the proof. ∎

Examples will be given in Section 3 and Section 4.1 where Theorem 7 can be applied. Here, we present the result for a class of problems that admit a static reduction (see [49, Section 3.7], [50, Section 1.2], [19, 46]).

Lemma 8.

Consider a dynamic team problem with a symmetric partially nested information structure (see Definition 1) which admits a static reduction. Under Assumption 2.1, and Assumptions (a), (b), (c) of Theorem 7, if the cost function is jointly convex in $\underline{u}^{1},\dots,\underline{u}^{N}$ ${P}$ -almost surely, then the team is symmetrically optimal.

We note again that here by symmetry, we mean symmetry across the decision makers.

Proof.

The proof follows from Theorem 4(iii) and Theorem 7 since the team is convex on ${\bf\Gamma}$ under the static reduction which is equivalent to the dynamic problem. ∎

Hence, it follows that if a static reduction of an exchangeable, symmetrically optimal, dynamic team exists, then it is exchangeable and symmetrically optimal.

3 Convex mean-field teams with a symmetric information structure

In the following, we establish global optimality results for convex mean-field teams with a symmetric information structure (that is not necessarily partially nested).

Define state dynamics and observations as

[TABLE]

where functions $f_{t}$ and $h_{t}$ are measurable functions. The information structure of DMi at time $t$ is $I_{t}^{i}=\{{y}^{i}_{t}\}$ , and $\zeta_{t}^{i}:=(w_{t}^{i},v_{t}^{i})$ (with $\zeta_{0}^{i}:=(x_{0}^{i},w_{0}^{i},v_{0}^{i})$ ) denotes uncertainty corresponding to dynamics and observations at time $t$ for DMi which are exogenous random vectors in a standard Borel space. Denote $\mathbb{X}\subseteq\mathbb{R}^{m}$ , $\mathbb{U}\subseteq\mathbb{R}^{m^{\prime}}$ , $\mathbb{Y}\subseteq\mathbb{R}^{m^{\prime\prime}}$ , $\mathbb{W}$ , and $\mathbb{V}$ as the state space, action space, observation space and the space of disturbances of dynamics and observations of DMs at each time instances $t=0,\dots,T-1$ , respectively, where $m$ , $m^{\prime}$ , and $m^{\prime\prime}$ are positive integers.

Problem ( $\mathcal{P}_{T}^{N,\text{MF}}$ ):

Consider $N$ -DM teams with the expected cost function of $\underline{\gamma}_{T}^{1:N}$ as

[TABLE]

where $\omega_{0}:(\Omega,\mathcal{F})\to(\Omega_{0},\mathcal{F}_{0})$ is an exogenous random vector in the standard Borel space and $\underline{\gamma}_{T}^{1:N}=\gamma^{1:N}_{0:T-1}$ , and the cost function satisfies the following assumption.

Problem ( $\mathcal{P}_{T}^{\infty,\text{MF}}$ ):

Consider mean-field teams with the expected cost function of $\underline{\gamma}_{T}$ as

[TABLE]

where $J_{T}^{N}(\cdot)$ is defined in (8), $\underline{\gamma}_{T}^{i}=\gamma^{i}_{0:T-1}$ for $i\in\mathbb{N}$ and $\underline{\gamma}_{T}=\{\underline{\gamma}_{T}^{i}\}_{i\in\mathbb{N}}$ .

Assumption 3.1.

Assume

(a)

function $f_{t}:\mathbb{X}\times\mathbb{U}\times\mathbb{W}\to\mathbb{X}$ is continuous in its first and second arguments for all $w_{t}^{i}$ and for each $i\in\mathbb{N}$ and uniformly bounded,

(b)

function $h_{t}:\prod_{k=0}^{t}\mathbb{X}\times\prod_{k=0}^{t-1}\mathbb{U}\times\prod_{k=0}^{t}\mathbb{V}\to\mathbb{Y}$ is continuous in states and actions for all $v_{0:t}^{i}$ and for each $i\in\mathbb{N}$ , and

(c)

the cost function in (8), $c:\Omega_{0}\times\mathbb{X}\times\mathbb{U}\times\mathbb{U}\times\mathbb{X}\to\mathbb{R}_{+}$ , is continuous in its second, third, fourth, and fifth arguments for all $\omega_{0}$ .

3.1 Mean-field optimal policies as limits of optimal $N$ -DM teams

In the following, we first establish global optimality results under Assumption 3.2 (see Theorem 10), then we establish the result under a more relaxed assumption, Assumption 3.3 (see Theorem 11):

Assumption 3.2.

Assume

(i)

$(x_{0}^{1},x_{0}^{2},\dots)$ * are i.i.d. random vectors conditioned on $\omega_{0}$ ,*

(ii)

*for $t=0,\dots,T-1$ , $\{w^{i}_{t}\}_{i\in\mathbb{N}}$ are i.i.d. random vectors, and for $i\in\mathbb{N}$ , $\{w^{i}_{t}\}_{t=0}^{T-1}$ are mutually independent, and independent of $\omega_{0}$ and $(x_{0}^{1},x_{0}^{2},\dots)$ . For $t=0,\dots,T-1$ , $\{v^{i}_{t}\}_{i\in\mathbb{N}}$ are i.i.d. random vectors, and for $i\in\mathbb{N}$ , $\{v^{i}_{t}\}_{t=0}^{T-1}$ are mutually independent, and independent of $\omega_{0}$ , $(x_{0}^{1},x_{0}^{2},\dots)$ , and * $w^{i}_{t}$ s for $i\in\mathbb{N}$ and $t=0,\dots,T-1$ .

Assumption 3.3.

Assume that conditioned on $\omega_{0}$ , $(x_{0}^{1},x_{0}^{2},\dots)$ are exchangeable random vectors.

Later on we will establish an existence theorem under Assumption 3.2, and we note that the proof under Assumption 3.2 is more direct. This is why two separate theorems will be presented, and the proof of the latter will be built on that of the former.

Lemma 9.

Consider a team defined as ( $\mathcal{P}_{T}^{N,\text{MF}}$ ) with a symmetric information structure. Assume the team problem is convex in policies. Let the action space be compact and convex for each decision makers. Under Assumption 3.1 and Assumption 3.2, the team is symmetrically optimal.

Proof.

The proof follows from Theorem 7. ∎

Theorem 10.

Consider a team defined as ( $\mathcal{P}_{T}^{\infty,\text{MF}}$ ) with ( $\mathcal{P}_{T}^{N,\text{MF}}$ ) having a symmetric information structure for every $N$ . Assume for every $N$ the team problem is convex in policies. Let the action space be compact and convex for each DM. Under Assumption 3.1, and Assumption 3.2, if there exists a sequence of optimal policies for ( $\mathcal{P}_{T}^{N,MF}$ ), $\{\underline{\gamma}^{*,N}_{T}\}_{N}$ , which converges (for every DM due to the symmetry) pointwise to $\underline{\gamma}^{*,\infty}_{T}$ as $N\to\infty$ , then $\underline{\gamma}^{*,\infty}_{T}$ (which is identically symmetric) is an optimal policy for ( $\mathcal{P}_{T}^{\infty,MF}$ ).

Proof.

Following from Lemma 9, one can consider a sequence of $N$ -DM teams which are symmetrically optimal that defines ( $\mathcal{P}_{T}^{N,\text{MF}}$ ) and whose limit is identified with $(\mathcal{P}_{T}^{\infty,\text{MF}})$ . Define

[TABLE]

where $\delta_{Y}(\cdot)$ denotes the Dirac measure for any random vector $Y$ , and $B\in\mathcal{Z}:={\bf{U}}\times{\bf{Y}}\times{\bf{S}}$ , ${\bf{U}}:=(\prod_{t=0}^{T-1}\mathbb{U})$ , ${\bf{Y}}:=(\prod_{t=0}^{T-1}\mathbb{Y})$ , ${\bf{S}}:=(\prod_{t=0}^{T-1}{\mathbb{S}})=\mathbb{X}\times(\prod_{t=0}^{T-1}{\mathbb{W}\times\mathbb{V}})$ , ${\bf X}=(\prod_{t=0}^{T-1}\mathbb{X})$ , $\underline{y}^{i}=(y_{0}^{i},\dots,y_{T-1}^{i})$ , and $\underline{\zeta}^{i}:=(\zeta^{i}_{0},\dots,\zeta_{T-1}^{i})$ .

In the following, first, we show that conditioned on $\omega_{0}$ , $Q_{N}$ converges ${P}$ -almost surely to $Q=\mathcal{L}(\beta^{1}_{\infty}|\omega_{0})$ in $w$ - $s$ topology (coarsest topology on $\mathcal{P}({\bf{U}}\times{\bf{Y}}\times{\bf{S}})$ under which $\int f(u,y,\zeta)Q_{N}(du,dy,d\zeta):\mathcal{P}({\bf{U}}\times{\bf{Y}}\times{\bf{S}})\to\mathbb{R}$ is continuous for every measurable and bounded function $f$ which is continuous in $u$ and $y$ but need not to be continuous in $\zeta$ (see e.g., [39] and [47, Theorem 5.6])). Then, we show that

[TABLE]

where $\underline{\tilde{\gamma}}_{T}^{*,N}:=(\underline{{\gamma}}_{T}^{*,N},\underline{{\gamma}}_{T}^{*,N},\dots,\underline{{\gamma}}_{T}^{*,N})$ and $\underline{\tilde{\gamma}}^{*,\infty}_{T}:=(\underline{{\gamma}}^{*,\infty}_{T},\underline{{\gamma}}^{*,\infty}_{T},\dots)$ .

(Step 1):

In this step, we show that conditioned on $\omega_{0}$ , $Q_{N}$ converges ${P}$ -almost surely to $Q$ in $w$ - $s$ topology. First, we show that for every continuous and bounded function $g$ in actions and observations, for every $\omega_{0}$ on a set of ${P}$ -measure one,

[TABLE]

Following from symmetry of the information structure and Lemma 9, every DM applies an identical optimal policy $\underline{\gamma}_{T}^{*,N}$ and since functions $f_{t}$ and $h_{t}$ are identical for each DM, conditioned on $\omega_{0}$ , $(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$ and $(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$ are i.i.d. random vectors. For every $\epsilon>0$ and for every function $g$ continuous and bounded in actions and observations, we have $P$ -almost surely

[TABLE]

where (12) follows from Markov’s inequality, the triangle inequality and the definition of the empirical measure, and (13) follows from the fact that $(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$ and $(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$ are i.i.d. random vectors. Since $g$ is bounded and continuous, the dominated convergence theorem implies (14). Hence, for every subsequence, there exists a subsubsequence such that $P$ -almost surely ${{P}\left(\{\omega\in\Omega|\lim\limits_{N\to\infty}\left(\int gdQ_{N}-\int gd\tilde{Q}_{N}\right)=0\}\big{|}\omega_{0}\right)}=1$ .

Now, we show that conditioned on $\omega_{0}$ , $\{\tilde{Q}_{N}\}_{N}$ converges weakly to $Q$ ${P}$ -almost surely. Since conditioned on $\omega_{0}$ , $(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$ are i.i.d. random vectors, the strong law of large numbers implies that $P$ -almost surely

[TABLE]

hence, ${{P}\left(\{\omega\in\Omega|\lim\limits_{N\rightarrow\infty}\left(\int gd\tilde{Q}_{N}-\int gdQ\right)=0\}\big{|}\omega_{0}\right)}=1$ $P$ -almost surely.

Hence, through choosing a suitable subsequence, for every $\omega_{0}\in\Omega_{0}$ on a set of ${P}$ -measure one, for every function $g$ continous and bounded in actions and observations and measurable and bounded in uncertainty and initial states

[TABLE]

hence, conditioned on $\omega_{0}$ , $Q_{N}$ converges weakly to $Q$ ${P}$ -almost surely. We note that the convergence is weakly, but since $\underline{\zeta}^{i}$ s are exogenous with a fixed probability measure, the convergence is also in the $w$ - $s$ topology.

(Step 2):

Following from (6) and (7), we have

[TABLE]

where following from Assumption 3.1, $\tilde{f}_{t-1}$ and $\tilde{h}_{t}$ are continuous in actions. Hence, under Assumption 3.1(c), we have

[TABLE]

where (18) is true following from (16) for some functions $\tilde{c}:\Omega_{0}\times{\bf S}\times{\bf U}\times{\bf U}\times{\bf X}\to\mathbb{R}_{+}$ which are continuous in its last three arguments and a function $\Lambda:{\bf U}\times{\bf S}\to{\bf X}$ which is continuous in actions. Hence, by induction and rewriting observations as a functions of policies of the past DMs ( $\gamma_{\downarrow t}^{*,N}(y_{\downarrow t}^{i})$ ) since $\underline{\gamma}_{T}^{*,N}$ converges to $\underline{\gamma}_{T}^{*,\infty}$ , the induced cost by $\underline{\gamma}_{T}^{*,N}$ also converges to the cost induced by $\underline{\gamma}_{T}^{*,\infty}$ ${P}$ -almost surely.

(Step 3):

We have

[TABLE]

where (19) follows from (18). Inequality (20) follows from (10) and replacing limsup with liminf, and (21) follows from Fatou’s lemma. In the following, we justify (22). Since conditioned on $\omega_{0}$ , $Q_{N}$ converges weakly to $Q$ ${P}$ -almost surely, we have $Q_{N}(du\times{\bf{Y}}\times{\bf{S}})$ converges weakly to $Q(du\times{\bf{Y}}\times{\bf{S}})$ ${P}$ -almost surely conditioned on $\omega_{0}$ , hence, the compactness of ${\bf{U}}$ implies that conditioned on $\omega_{0}$ , ${P}$ -almost surely

[TABLE]

Since conditioned on $\omega_{0}$ , $Q_{N}$ converges weakly to $Q$ ${P}$ -almost surely, we have $Q_{N}(du\times{\bf{Y}}\times d\zeta)$ converges ${P}$ -almost surely to $Q(du\times{\bf{Y}}\times d\zeta)$ in $w$ - $s$ topology conditioned on $\omega_{0}$ . Following from (16), since $f_{t}$ s are bounded and continuous in actions, $\Lambda$ is bounded and continuous in actions, hence, this implies that conditioned on $\omega_{0}$ , ${P}$ -almost surely

[TABLE]

Since the cost function $\tilde{c}$ is continuous in its last three arguments, ${P}$ -almost surely

[TABLE]

Define a non-negative bounded functions

[TABLE]

where the sequence $\{G_{M}\}_{M}$ converges as $M\to\infty$ to

[TABLE]

We have ${P}$ -almost surely

[TABLE]

where (25) is true since

[TABLE]

Equality (26) follows from the generalized convergence theorem in [40, Theorem 3.5] since $G_{N}^{M}$ is bounded and continuously converges to $G^{M}$ , i.e., ${P}$ -almost surely

[TABLE]

when $u_{N}\to u$ as $n\to\infty$ . The monotone convergence theorem implies (27). Hence, (22) holds which implies $\limsup\limits_{N\rightarrow\infty}J_{T}^{N}(\underline{\tilde{\gamma}}_{T}^{*,N})=J_{T}^{\infty}(\underline{\tilde{\gamma}}_{T}^{*,\infty})$ , and this completes the proof following from [38, Theorem 5]. Here, for completeness we present the proof which is similar to the analysis of the proof [38, Theorem 5] for dynamic teams,

[TABLE]

where (29) is true since the restriction $\underline{\gamma}_{T}$ to the first $N$ components is $\underline{\gamma}_{T}^{1:N}$ . This implies that $\underline{\tilde{\gamma}}_{T}^{*,\infty}$ is globally optimal. ∎

Remark 2.

On the connection between finitely many DMs and infinitely many DMs, we note a closely related work on mean-field games by Fischer [15] where the information structure is assumed to be static since the policy of each player is assumed to be adapted to the filtration generated by his/her initial states and Wiener process (also called in the mean-field games’ literature, somewhat non-standard in the control literature, as open-loop distributed controllers [23],[14, pages 72-76]). This means that the information of each DM is not affected by any of the actions of the other DMs. For dynamic teams, there are two difficulties: (1) obtaining variational equations is challenging since fixing policies of DMs and perturbing only DM’s policies, perturbs the observation of other DMs and hence the controls $u^{-i*}=(\gamma^{1}(y^{1}),\dots,\gamma^{i-1*}(y^{i-1}),\gamma^{i+1*}(y^{i+1}),\dots,\gamma^{N*}(y^{N}))$ ; (2) solutions of variational equations which give person-by-person optimal policies are inconclusive for global optimality due to the lack of convexity in general.

Remark 3.

We also note additional related works by Lacker [24, 25] where either convergence of open-loop controllers, or convergence of Nash equilibria induced by closed-loop controllers (where controls are measurable path-dependent functions of states, $u_{t}^{i}=\phi(t,x_{0:t})$ , where $x_{0:t}=(x_{0:t}^{1},\dots,x_{0:t}^{N})$ and $\phi$ is a measurable function) or Markovian controllers ( $u_{t}^{i}=\phi(t,x_{t})$ , where $x_{t}=(x_{t}^{1},\dots,x_{t}^{N})$ ) have been considered. In [25], the information structure is classical (a centralized problem) since players have access to all the information available to previous DMs).

Remark 4.

In Lemma 9 and Theorem 10, we considered a non-classical information structure for teams defined as ( $\mathcal{P}_{T}^{\infty,\text{MF}}$ ) with a convex expected cost in policies. For teams defined as ( $\mathcal{P}_{T}^{\infty,\text{MF}}$ ) with a symmetric partially nested information structure which admit static reduction, the above result holds and similar to the proof of Theorem 10, it can be proven under the assumption that the cost functions is convex in actions (since convexity of the cost function in actions is a sufficient condition for convexity of the expected cost function in policies for this class of problems [50, Theorem 3.7]).

Remark 5.

Assumptions that action spaces are compact and $f_{t}$ s are bounded can be relaxed by assuming that

(A1)

$\sup\limits_{N\geq 1}E[|\underline{\gamma}^{*,N}_{T}(\underline{y}^{1})|^{1+\delta}]<\infty$ * for some $\delta>0$ ,*

(A2)

$\sup\limits_{N\geq 1}E[|\Lambda(\underline{\gamma}^{*,N}_{T}(\underline{y}^{1}),\underline{\zeta}^{1})|^{1+\tilde{\delta}}]<\infty$ * for some $\tilde{\delta}>0$ .*

That is because, following from the pointwise convergence of $\underline{\gamma}^{*,N}_{T}$ and continuity of $\Lambda$ in actions, the above uniform integrability assumption justifies exchanging the limit and the expectation required to establish the convergence in (23) and (24) using a similar analysis of (14) and an argument of (15) based on the strong law of large numbers. This result is particularly important for LQG models (we use this remark in Section 4).

Theorem 11.

Consider a team defined as ( $\mathcal{P}_{T}^{\infty,\text{MF}}$ ) with ( $\mathcal{P}_{T}^{N,\text{MF}}$ ) having a symmetric information structure for every $N$ . Assume for every $N$ the team problem is convex in policies. Let the action space be compact and convex for each DM, and assume Assumption 3.1, Assumption 3.2(ii), and Assumption 3.3 hold. If there exists a sequence of optimal policies for ( $\mathcal{P}_{T}^{N,MF}$ ), $\{\underline{\gamma}^{*,N}_{T}\}_{N}$ , which converges (for every DM due to the symmetry) pointwise to $\underline{\gamma}^{*,\infty}_{T}$ as $N\to\infty$ , then $\underline{\gamma}^{*,\infty}_{T}$ (which is identically symmetric) is an optimal policy for ( $\mathcal{P}_{T}^{\infty,MF}$ ).

Proof.

Under Assumption 3.2(ii) and Assumption 3.3, for every $A^{i}\in\mathcal{B}({\bf{S}})$ , and $A^{i}=B^{i}\times\prod_{t=0}^{T-1}(D_{t}^{i}\times E_{t}^{i})$ (where $B^{i}\in\mathcal{B}(\mathbb{X})$ , $D_{t}^{i}\in\mathcal{B}(\mathbb{W})$ , and $E_{t}^{i}\in\mathcal{B}(\mathbb{V})$ ), for all $N\in\mathbb{N}$ , and permutations $\sigma$ , we have $P$ -almost surely

[TABLE]

where (30) follows from Assumption 3.2(ii), and (31) follows from Assumption 3.3. Hence, $(\underline{\zeta}^{1},\underline{\zeta}^{2},\dots)$ are exchangeable conditioned on $\omega_{0}$ .

Hence, following from a similar argument as the proof of Theorem 7 (by considering $\omega_{0}$ in the cost function and the law of total expectation (by first conditioning on $\omega_{0}$ ), under Assumption 3.2(ii) and Assumption 3.3, one can consider a sequence of $N$ -DM teams which are symmetrically optimal that defines ( $\mathcal{P}_{T}^{N,\text{MF}}$ ) and whose limit is identified with $(\mathcal{P}_{T}^{\infty,\text{MF}})$ . Since initial states are not necessarily independent conditioned on $\omega_{0}$ , we can not establish that $(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$ are i.i.d. random vectors conditioned on $\omega_{0}$ which has been used in (15) in (Step 1) of the proof of Theorem 10 to show that $Q_{N}$ converges weakly to $Q$ ${P}$ -almost surely.

However, we note that since $(\underline{\zeta}^{1},\underline{\zeta}^{2},\dots)$ are exchangeable conditioned on $\omega_{0}$ , for every $A^{i}\in\mathcal{B}({\bf{S}})$ and $C\in\mathcal{B}(\Omega_{0})$ , and for all $N\in\mathbb{N}$ , and permutations $\sigma$ , we have

[TABLE]

Let $\alpha^{i}:=(\omega_{0},\underline{\zeta}^{i})$ . Hence, (32) implies that $(\alpha^{1},\alpha^{2},\dots)$ is exchangeable. Following from [1, Proposition 3.8(a)], there exists a random vector ${z}\in[0,1]$ such that, $({\underline{\zeta}}^{1},{\underline{\zeta}}^{2},\dots)$ are i.i.d. random vectors conditioned on $(\omega_{0},{z})$ .

Let $\tilde{\omega}_{0}:=(\omega_{0},z)$ . Hence, under Assumption 3.2(ii) and Assumption 3.3, conditioned on $\tilde{\omega}_{0}$ , $(\underline{\zeta}^{1},\underline{\zeta}^{2},\dots)$ are i.i.d. random vectors. Following from standard stochastic realization results [10, Lemma 3.1], we can represent any stochastic kernel in a functional form, with almost sure equivalence, $\underline{\zeta}^{i}=g(\tilde{\omega}_{0},\theta^{i})$ for some independent $\theta^{i}$ and measurable $g$ (note that following from exchangeability, $g$ is identical for all $i\in\mathbb{N}$ and $(\theta^{1},\theta^{2},\dots)$ are i.i.d. random vectors).

Since conditioned on $\tilde{\omega}_{0}$ , $(\underline{\zeta}^{1},\underline{\zeta}^{2},\dots)$ are i.i.d. random vectors, $(\underline{\gamma}^{*,\infty}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$ are i.i.d. random vectors conditioned on $\tilde{\omega}_{0}$ , hence for every $\tilde{\omega}_{0}$ on a set of $P$ -measure one, we have for every continuous and bounded function $g$ in actions and observations, by the strong law of large numbers,

[TABLE]

Hence, following from an identical analysis as that of (Step 1) of the proof of Theorem 10, conditioned on $\tilde{\omega}_{0}$ , $Q_{N}$ converges weakly to $Q$ , for every $\tilde{\omega}_{0}$ on a set of ${P}$ -measure one.

Following from the representation $\underline{\zeta}^{i}=g(\tilde{\omega}_{0},\theta^{i})$ , we have

[TABLE]

where (33) follows from the fact that for all $N\in\mathbb{N}$ , and for every $A^{i}\in\mathcal{B}({\bf{S}})$

[TABLE]

and with slightly abuse of notations we use the same notation, $\tilde{c}$ , for the cost function after transformation in (33). The rest of the proof is identical to that of Theorem 10. ∎

3.2 An existence theorem on globally optimal policies for dynamic mean-field team problems with a symmetric information structure

An implication of Theorem 10 is the following existence result on globally optimal policies for mean-field team problems. In particular, we will establish the existence of a converging subsequence, in an appropriate sense, for a sequence of optimal policies for $N$ -DM teams with an increasing number of DMs. For the following theorem, we do not establish the pointwise convergence; but by Theorem 10, if a sequence of optimal policies for ( $\mathcal{P}_{T}^{N,MF}$ ), $\{\underline{\gamma}^{*,N}_{T}\}_{N}$ , converges pointwise, a global optimal policy exists. To this end, we allow decision makers to apply randomized policies. For each decision maker (DMi for $i\in\mathbb{N}$ ), a probability measure $P\in\mathcal{P}(\Omega_{0}\times\mathbb{X}\times\prod_{t=0}^{T-1}(\mathbb{W}\times\mathbb{V})\times\prod_{t=0}^{T-1}(\mathbb{U}\times\mathbb{Y}))$ is a policy induced by a randomized policy if and only if for every $t=0,\dots,T-1$ and for all continuous and bounded function $g$

[TABLE]

for a stochastic kernel $\Pi^{i}_{k}$ on $\mathbb{U}$ given $\mathbb{Y}$ , where $p_{k}^{i}$ is the transition kernel characterizing the observations of DMi at time $t$ ,

[TABLE]

and $\mu^{i}$ is a fixed probability measure on initial states and disturbances of DMi conditioned on $\omega_{0}$ . This equivalency follows from the fact that continuous and bounded functions form a separating class [8, page 12] and [48, Theorem 2.2].

First, we present an absolute continuity assumption on observations of DMs.

Assumption 3.4.

For every DMi and $t=0,\dots,T-1$ , there exists a function $\psi_{t}^{i}:\mathbb{Y}\times\Omega_{0}\times\mathbb{X}\times\prod_{k=0}^{t-1}(\mathbb{W}\times\mathbb{V})\times\prod_{k=0}^{t-1}(\mathbb{Y}\times\mathbb{U})\to\mathbb{R}_{+}$ continuous in actions, and a probability measure $\nu_{t}^{i}$ on $\mathbb{Y}$ such that for all Borel sets $A=A^{1}\times\dots\times A^{N}$ ,

[TABLE]

This assumption allows us to obtain an independent measurements reduction (see [47, Section 2.2]). For example, if $v_{t}^{i}$ for all $i\in\mathbb{N}$ and $t=0,\dots,T-1$ are i.i.d with a probability measure admitting a density function so that the observation of each DMi at time $t$ is $y_{t}^{i}=\tilde{h}_{t}(x_{t}^{i},u_{\downarrow t}^{i})+v_{t}^{i}$ , where $\tilde{h}_{t}$ is continuous, then Assumption 3.4 holds [16, Lemma 5.1].

Theorem 12.

Consider a team defined as ( $\mathcal{P}_{T}^{\infty,\text{MF}}$ ) with ( $\mathcal{P}_{T}^{N,\text{MF}}$ ) having a symmetric information structure for every $N$ . Assume for every $N$ , the team problem is convex in policies and the action space is convex. Assume further that without any loss, the optimal policies can be restricted to those with $E(\phi_{i}({u}^{i}))\leq K$ for some finite $K$ , where $\phi_{i}:{\mathbb{U}}\to\mathbb{R}_{+}$ is lower semi-continuous (moment condition). Under Assumption 3.1 and Assumption 3.2 if either

(i)

Assumption 3.4 holds (with no further assumptions on the information structure of each DMi for $i\in\mathbb{N}$ through time $t=0,\dots,T-1$ ), or

(ii)

for each DMi for $i\in\mathbb{N}$ through time $t=0,\dots,T-1$ , there exists a static reduction with the classical information structure (i.e., under a static reduction, the information structure is expanding such that $\sigma(y_{t}^{i})\subset\sigma(y_{t+1}^{i})$ where $\sigma$ denotes the $\sigma$ -field),

then there exists an optimal policy for ( $\mathcal{P}_{T}^{\infty,MF}$ ).

Since the space of policies that are deterministic (where $\Pi_{k}^{i}$ in (34) are indicator functions) are not closed under the weak convergence topology (e.g., as an implication of [50, Theorem 2.7]), we allow for randomization in the policies and therefore the limit policy is not necessarily deterministic according to the above result; however, it is identical for each DM.

Proof.

We use individually randomized policies and we show that for every sequence of $N$ -DM optimal policies, there exists a subsequence which converges to an optimal independently randomized policy for the mean-field limit under an appropriate topology defined by the product topology where each coordinate is endowed with the weak convergence topology. In (Step 1), we show that for each finite $N$ -DM team problem, optimal policies are deterministic and symmetric and we consider the independently randomized policies induced by such policies $\{P_{N}\}_{N}$ (where $P_{N}\in\mathcal{P}({\bf{Y}}\times{\bf{U}})$ for each DM satisfying (34)), as our sequence to be studied. We also define the sequence of empirical measures induced by these policies, $Q_{N}$ , as (10).

In (Step 2), we show that for every sequence of policies satisfying a moment condition there exists a subsequence such that policies $\{P_{n}\}_{n}$ for each DM and a subsequence of empirical measures $\{Q_{n}\}_{n}$ induced by these policies (where $n\in\mathbb{I}$ is the index set of a convergent subsequence) converge weakly to a limit $P$ -almost surely, that is, for a set of ${P}$ -measure one, for every bounded function $g$ which is continuous in actions and observations and measurable in uncertainties,

[TABLE]

To this end, we first show that for each DM a sequence $\{P_{n}\}_{n}$ is tight, then we show that the sequence of empirical measures $\{Q_{n}\}_{n}$ induced by these policies converges weakly to a limit $P$ -almost surely.

In (Step 3), we show that the set of policies for each DM is closed under the weak convergence topology, hence, the limit policy satisfies the required measurability/conditional independence constraints (that is, the limit policy satisfies (34)). In (Step 4), we use the lower semi-continuity argument to show that the expected cost function under the induced limit policy is less than or equal to the expected cost achieved by the sequence of $N$ -DM optimal policies.

(Step 1):

Under Assumption 3.1, Assumption 3.1(c) and Assumption 3.2, and by condition (i) using [47, Theorem 5.2], or condition (ii) using [47, Theorem 5.6], there exists a deterministic optimal policy for each finite $N$ -DM team problem. Action spaces are convex and the team problem is convex in policies, hence, using Lemma 9, one can consider a sequence of $N$ -DM teams which are symmetrically optimal that defines ( $\mathcal{P}_{T}^{N,MF}$ ) and whose limit is identified with ( $\mathcal{P}_{T}^{\infty,MF}$ ). Hence, for each $N$ -DM team problem, we consider symmetric randomized optimal policies.

(Step 2):

In the following, we first show that the set of policies $P_{N}\in\mathcal{P}({\bf{Y}}\times{\bf{U}})$ for each DM satisfying (34) and the moment condition is tight, then, by symmetry, we show that $\{Q_{N}\}_{N}$ is induced by this set of policies is also tight. We use the fact that conditioned on $\omega_{0}$ , $(\underline{\gamma}^{*,N}_{T}(\underline{y}^{i}),\underline{y}^{i},\underline{\zeta}^{i})$ are i.i.d. random vectors (this follows from symmetry of the information structure and Lemma 9 since every DM applies the identical policy $\underline{\gamma}_{T}^{*,N}$ ) and also since the space of control policies is tight under the weak convergence for each DM (see e.g., [47, proof of Theorem 4.7]).

Since actions of DMs do not affect the observations of others, the policy spaces are decoupled from the actions of other decision makers. Since we can restrict the search for optimal policies over those satisfying the moment condition, the fact that $\nu\to\int\nu(dx)g(x)$ is lower semi-continuous for a continuous function $g$ [47, proof of Theorem 4.7] implies that the marginals on $\bf{U}$ satisfying the moment condition are tight under the weak convergence topology. Hence, the collection of all probability measures with these tight marginals is also tight (see e.g., [48, Proof of Theorem 2.4]). This implies that the sequence of randomized policies satisfying the moment condition is tight.

Since every DM applies an identical policy and since observations are conditionally i.i.d., a countably infinite product of space of policies of DMs is tight (where each coordinate is tight in the weak convergence topology). Hence, there exists a subsequence of policies $\tilde{P}_{n}\in\mathcal{P}(\prod_{i}({\bf{Y}}\times{\bf{U}}))$ (as a product of policies of DMs) converges weakly to a limit $\tilde{P}$ (each coordinate converges weakly) ${P}$ -almost surely. Furthermore, since every DM applies an identical policy, conditioned on $\omega_{0}$ , actions induced by an identical randomized policies, observations and disturbances are i.i.d. through DMs. Hence, following from a similar argument as (Step 1) of the proof of Theorem 10, a subsequence of empirical measures $\{Q_{n}\}_{n\in\mathbb{I}}$ converges ${P}$ -almost surely to $Q$ in $w$ - $s$ topology. We note that the convergence is under the weak convergence topology, but since $\underline{\zeta}^{i}$ s are exogenous with a fixed marginal, the convergence is also in the $w$ - $s$ topology.

(Step 3):

In this step, we show that each coordinate of the space of policies (space of policies for each DM) is closed under the weak convergence topology. This in particular implies that the space of policies is closed under the product topology and using (Step 1), we can conclude that the space of control policies is compact under the product topology where each coordinate is weakly compact.

Assume $P_{n}$ is a policy for DMi induced by a randomized policy converging weakly to $P_{\infty}$ . In fact, conditions (i) or (ii) leads to the closedness of the set of policies (see (34)) induced by $P_{n}$ . If Assumption 3.4 holds, then by the discussion in the proof of [47, Theorem 5.2], each coordinate of policy spaces corresponding to DMi acting through time is closed under the weak convergence topology. Also, if condition (ii) holds, then [47, Theorem 5.6] leads to the same conclusion. Hence, each coordinate of space of policies (corresponding to DMi) is closed under the weak convergence topology (since each coordinate of the space of policies is a finite product of space of policies for each DM at time instances $t=0,\dots,T-1$ ). Hence, following from (Step 2), there exists a subsequence $\{Q_{n}\}_{n\in\mathbb{I}}$ converges weakly to $Q$ ${P}$ -almost surely where $Q$ is induced by a randomized policy in the set of policies satisfying (34) and the limit policy is admissible and satisfies the required measurability/conditional independence constraints.

Let for every $t=0,\dots,T-1$ , $P^{*,\omega_{0}}_{n}$ be a probability measure on actions, observations and uncertainties induced by optimal randomized policies for each DM (which is identical because of symmetry) for $N$ -DM teams conditioned on $\omega_{0}$ , i.e., a probability measure that satisfies

[TABLE]

for all bounded functions $g$ which is continuous in actions and observations and measurable in other arguments. We denote $\underline{u}^{i,*}_{n}:=({u}^{i,*}_{n,0},\dots,{u}^{i,*}_{n,T-1})$ as the action of DMi through time induced by $\Pi^{*,n}_{t}$ . Similarly, we denote $P^{*,\omega_{0}}$ as a probability measure induced by the limit policy, i.e., a probability measure satisfying (35) induced by $\Pi_{k}^{*,\infty}$ where $\underline{u}^{i,*}_{\infty}:=({u}^{i,*}_{\infty,0},\dots,{u}^{i,*}_{\infty,T-1})$ is the action of DMi through time induced by $\Pi_{k}^{*,\infty}$ .

(Step 4):

Now, we show that the expected cost function under the limit randomized policy is less than or equal to the expected cost achieved by $\limsup\limits_{n\rightarrow\infty}J_{T}^{n}(\underline{\tilde{\gamma}}_{T}^{*,n})$ . Since the cost function is continuous in states and actions, under the reduction (conditions (i) or (ii)), we have ${P}$ -almost surely

[TABLE]

where (36) is true following from (7) and Assumption 3.1 for some functions $\bar{c}:\Omega_{0}\times{\bf S}\times{\bf U}\times{\bf U}\times{\bf X}\to\mathbb{R}_{+}$ continuous in states and actions and a function $\Lambda:{\bf U}\times{\bf S}\to{\bf X}$ continuous in actions and

[TABLE]

We have

[TABLE]

where (37) follows from the definition of empirical measures and by integrating over the set $(\prod_{i=n_{l}+1}^{\infty}{\bf{Y}\times{\bf{S}}})$ and since ${P}$ -almost surely

[TABLE]

Inequality (38) is true since limsup is the greatest convergent subsequential limit for a bounded sequence and (39) follows from the dominated convergence theorem. We note that $\{Q_{n}\}_{n}$ is induced by $\underline{u}^{i,*}_{n}$ for each DM. Since $\{Q_{n}\}_{n\in I}$ converges weakly to $Q$ ${P}$ -almost surely, by the moment condition and Remark 5, a similar argument as (Step 3) of the proof of Theorem 10 implies that ${P}$ -almost surely

[TABLE]

Hence, (40) follows from [40, Theorem 3.5] since

[TABLE]

is bounded and non-negative, and continuously converges in $u$ ${P}$ -almost surely (see (28)). That is because, conditioned on $\omega_{0}$ , $\underline{y}^{i}$ are i.i.d. random vectors (thanks to the symmetry), the space of policies is compact under the product topology (with the weak convergence topology for each coordinate (for each DM)), $\prod_{i=1}^{\infty}{{\underline{\psi}}}(\underline{y}^{i},\omega_{0},\underline{\zeta}^{i},\underline{u}^{i,*}_{n})$ converges in the product topology, and the cost function and $\underline{\psi}$ are continuous. Finally, (41) follows from the monotone convergence theorem. Hence, the proof is completed.

∎

Remark 6.

For the existence result, to show that the set of policies induced by independently randomized policies for each DM through time $t=0,\dots,T-1$ (see (34)) is closed under the weak convergence topology, we utilized the result in [47, Section 5.2] which are more general than those in [16, 50]. We note that the extension of the existence results in [47, Section 5.2] to our setup is not immediate since the conclusion of (Step 3) can not be established rigorously without considering the technical steps involving infinite dimensions and limit arguments.

4 Symmetric LQG dynamic teams

In the section, we consider LQG setup where the results of Section 2 and Section 3 can be applied. We first consider $N$ -DM LQG problems where we use Theorem 7 to show that the globally optimal policies are symmetric. Then, based on symmetry, we calculate $N$ -DM optimal policies for such problems. Next, using Theorem 10 and Theorem 11, we show the convergence of $N$ -DM optimal policies to optimal policies of LQG mean-field teams with countably infinite number of DMs. Finally, we consider infinite horizon problems where we use symmetry and convergence results to obtain global optimal policies for such problems.

4.1 Symmetric partially nested LQG dynamic teams on a graph

In the following, we consider decentralized problems where Theorem 7 can be utilized and the optimal policy can be obtained. First, we formulate LQG problems with a symmetric partially nested information structure. Consider the following dynamics. Let $i=1,2$ , and

[TABLE]

Problem ( $\mathcal{P}_{T}$ ):

Consider the expected cost function of $(\underline{\gamma}_{T}^{1},\underline{\gamma}_{T}^{2})$ as

[TABLE]

where $\underline{\gamma}_{T}^{i}=(\gamma^{i}_{0:T-1})$ , and $R,\tilde{R}>0$ and $Q\geq 0$ . Let

[TABLE]

where ${\zeta}^{i}_{t}=(w_{t}^{i},v_{t}^{i})$ with ${\zeta}^{i}_{0}=(x_{0}^{i},w_{0}^{i},v_{0}^{i})$ . Let $n,m,s\in\mathbb{N}$ and $\mathbb{X}=\mathbb{R}^{n}$ , $\mathbb{Y}=\mathbb{R}^{s}$ , $\mathbb{U}=\mathbb{R}^{m}$ , $w_{t}^{i}\in\mathbb{R}^{n}$ , $v_{t}^{i}\in\mathbb{R}^{n}$ , $A\in\mathcal{M}_{n,n}$ , $B\in\mathcal{M}_{n,m}$ , $R\in\mathcal{M}_{m,m}$ , $Q\in\mathcal{M}_{n,n}$ , $\tilde{R}\in\mathcal{M}_{m,m}$ , $H_{t}\in\mathcal{M}_{s,n(2t+1)}$ , and $D_{tj}\in\mathcal{M}_{s,m}$ . Let the information structure of each DM be $I_{t}^{i}=\{{y}^{i}_{t},{y}_{\downarrow t}^{i}\}$ .

In the following, we show that the above dynamic teams are symmetrically optimal under sufficient conditions on the observations and initial states.

Corollary 13.

For a fixed $T$ , consider a finite horizon team problem defined above as ( $\mathcal{P}_{T}$ ). If $x_{0}^{1}$ and $x_{0}^{2}$ are exchangeable zero mean Gaussian random vectors and $w^{i}_{t}$ s and $v^{i}_{t}$ s are i.i.d. Gaussian random vectors for $i=1,2$ and independent for all $t=0,\dots,T-1$ and also independent of initial states, then the dynamic team is symmetrically optimal.

Proof.

Since the dynamic team is LQG with a partially nested information structure, a static reduction exists and the expected cost is convex in policies under static reductions (see [18] and Theorem 4(iii)). Assumption 2.1 is satisfied following from (43). We need to show assumptions of Theorem 7 hold. Following from the hypothesis on disturbances and initial states, Assumption (b) holds. Assumption (c) holds following from Assumption 7 and since given $(x_{0}^{1},x_{0}^{2})$ , $(y_{0:T-1}^{1},y_{0:T-1}^{2})$ are independent. Hence, Theorem 7 completes the proof. ∎

Here, we consider a class of LQG dynamic teams with a tree information structure where we utilize Corollary 13 and we obtain an explicit recursion for the optimal policy.

Problem ( $\mathcal{P}_{T}^{\text{tree}}$ ):

Consider a finite horizon expected cost (43) with $I_{t}^{i}=\{x_{[0:t]}^{i},u_{[0:t-1]}^{i}\}$ .

We note that this problem is a special case of ( $\mathcal{P}_{T}$ ) since we assumed that $y_{t}^{i}=x_{t}^{i}$ for all DMi and $t=0,\dots,T-1$ and also all DMs have a total recall ( $I_{t}^{i}=\{x_{[0:t]}^{i},u_{[0:t-1]}^{i}\}$ ). For this problem, we calculate an explicit recursion for optimal policies using the symmetry established in Corollary 13.

Theorem 14.

For a fixed $T$ , consider a finite horizon team problem defined as ( $\mathcal{P}_{T}^{\text{tree}}$ ). If ( $x_{0}^{1},x_{0}^{2})$ are exchangeable with an identical zero mean Gaussian distribution and $w^{i}_{t}$ s are i.i.d. zero mean Gaussian random vectors for $i=1,2$ and independent for all $t=0,\dots,T-1$ and independent of initial states, then

[TABLE]

where

[TABLE]

where $\Sigma=E[x_{0}^{1}(x_{0}^{2})^{T}](E[x_{0}^{2}(x_{0}^{2})^{T}])^{-1}$ , $P_{T}^{(T)}=0$ , $G_{0}^{(T)}=I$ . Moreover, the optimal cost is

[TABLE]

Proof.

Following from [18] and Radner’s theorem [31], person-by-person optimality implies global optimality due to the uniqueness of the person-by-person optimal policy. That is because the information structure is partially nested, and LQG dynamic teams can be reduced to a static one using Ho-Chu’s static reduction [18]. Hence, we only need to show that the policy satisfying (45) and (46) is person-by-person optimal. We show that for DM1, $J(\underline{\gamma}^{*}_{T},\underline{\gamma}^{*}_{T})\leq J((\underline{\gamma}^{-t*}_{T},\beta),\underline{\gamma}^{*}_{T})$ for all $\beta\in\Gamma^{t}$ where $(\underline{\gamma}^{-t*}_{T},\beta)=(\gamma_{0:t-1}^{*},\beta,\gamma_{t+1:T-1}^{*})$ . This implies that $(\underline{\gamma}^{*}_{T},\underline{\gamma}^{*}_{T})$ is person-by-person optimal thanks to Corollary 13 since the dynamic team is symmetrically optimal (by exchanging policies $(\underline{\gamma}^{-t*}_{T},\beta)$ with $\underline{\gamma}^{*}_{T}$ which implies $J(\underline{\gamma}^{*}_{T},\underline{\gamma}^{*}_{T})\leq J(\underline{\gamma}^{*}_{T},(\underline{\gamma}^{-t*}_{T},\beta))$ for all $\beta\in\Gamma^{t}$ and this implies that $(\underline{\gamma}^{*}_{T},\underline{\gamma}^{*}_{T})$ is the fixed point of the equation). The proof is completed by induction. Due to space constraints, we have removed the calculation. ∎

Remark 7.

The optimal policies (45) and (46) contain two parts which can be interpreted as follows: the first part, $k_{t}^{(T)}x_{t}^{i}$ , is equivalent to the optimal policy of the branch (DM) by ignoring the other branch in the optimization problem (in this case, this is equivalent to the centralized policies since the information structure of each branch (DM) is centralized). The second part corresponds to the correlation term between branches (DMs).

In the following, we generalize the result of Theorem 14 to $N$ -DM LQG dynamic teams. Assume that the dynamics for $i=1,2,...,N$ are defined as (42).

Problem ( $\mathcal{P}_{T}^{N,\text{tree}}$ ):

Consider the expected cost function of $\underline{\gamma}_{T}^{1:N}$ as

[TABLE]

where $\underline{\gamma}_{T}^{i}=\gamma^{i}_{0:T-1}$ for $i=1,\dots,N$ and $R,\tilde{R}>0$ and $Q\geq 0$ . Let $I_{t}^{i}=\{x_{[0:t]}^{i},u_{[0:t-1]}^{i}\}$ .

Corollary 15.

For a fixed $T$ and $N$ , consider a finite horizon team problem defined as ( $\mathcal{P}_{T}^{N,\text{tree}}$ ). If $(x_{0}^{1:N})$ are exchangeable with an identical zero mean Gaussian distribution, and $w^{i}_{t}$ s for $i=1,\dots,N$ are i.i.d. zero mean Gaussian random vectors, independent for $t=0,\dots,T-1$ , and independent of initial states, then

[TABLE]

where $K_{t}^{(T)}$ and $P_{t}^{(T)}$ satisfy (47) and (48), and $L_{t}^{(N),(T)}$ is a function of $K_{0:t}^{(T)}$ .

Proof.

The proof is similar to the one of Theorem 14. ∎

Now we consider more general setup where using Corollary 13, we establish a structural result for the case where the information structure of each decision maker over time satisfies a structure which is identical for all DMs and is partially nested. An example of such a graph structure has been depicted in Fig. 2.

Problem ( $\mathcal{P}_{T}^{N}$ ):

Consider a finite horizon expected cost of $\underline{\gamma}_{T}^{1:N}$ as (51) with the information structure $I_{t}^{i}=\{{y}^{i}_{t},{y}_{\downarrow t}^{i}\}$ where ${y}^{i}_{t}$ is defined in (44) and dynamics is defined in (42).

Theorem 16.

For a fixed $T$ and $N$ , consider a finite horizon team problem defined as ( $\mathcal{P}_{T}^{N}$ ). If $(x_{0}^{1},\dots,x_{0}^{N})$ are exchangeable with an identical zero mean Gaussian distribution and $w^{i}_{t}$ s for $i=1,\dots,N$ are i.i.d. zero mean Gaussian random vectors, independent for all $t=0,\dots,T-1$ and independent of initial states, then

[TABLE]

where $K_{t}^{(T)}$ are obtained by considering only one DM and ignoring other DMs.

Proof.

The proof is similar to that of Theorem 14 by [18] and Corollary 13. ∎

Remark 8.

A related work is [30], where structural results for optimal policy have been obtained for finite horizon LQG problems on graphs. In our analysis above, the structural result for the optimal policy is obtained without assuming that decision makers who have no common ancestors and no common descendants have either uncorrelated noise or are decoupled through the cost function. Instead, exchangeable partially nested LQG teams with correlated initial states and disturbances are considered. Moreover, here, the graph structures may not be trees in general, as opposed to [30] where a multi-tree structure has been imposed on a graph.

Now, we present results for LQG teams with a mean-field coupling through the cost function. First, using Corollary 15, we obtain globally optimal policies for $N$ -DM teams with a mean-field coupling and correlated initial states and disturbances. Next, as an implication of Theorem 10, we show the convergence of optimal policies for LQG $N$ -DM mean-field teams on a tree to the corresponding optimal policy of mean-field teams. Let $I_{t}^{i}=\{x_{[0:t]}^{i},u_{[0:t-1]}^{i}\}$ for $i\in\mathbb{N}$ , and dynamics be as (42).

Problem ( $\mathcal{P}_{T,\text{LQG}}^{N,\text{MF}}$ ):

Consider the expected cost function of $\underline{\gamma}_{T}^{N}$ as

[TABLE]

where $R,\tilde{R}>0$ and $Q,\tilde{Q}\geq 0$ .

Problem ( $\mathcal{P}_{T,\text{LQG}}^{\infty,\text{MF}}$ ):

Consider the expected cost function of $\underline{\gamma}_{T}$ as

[TABLE]

Corollary 17.

For a fixed $T$ and $N$ , consider a finite horizon team problem defined as ( $\mathcal{P}_{T,\text{LQG}}^{N,\text{MF}}$ ). If $(x_{0}^{1:N})$ are exchangeable zero mean Gaussian random vectors, and $w^{i}_{t}$ s are i.i.d. zero mean Gaussian random vectors for $i=1,\dots,N$ , independent for $t=0,\dots,T-1$ , and independent of initial states, then

[TABLE]

where $K_{t}^{(T)}$ and $P_{t}^{(T)}$ satisfy (47) and (48), and $L_{t}^{(N),(T)}$ is a function of $K_{0:t}^{(T)}$ .

Proof.

The proof is similar to the one in Theorem 14. ∎

Corollary 18.

For a fixed $T$ , consider a finite horizon team problem defined as ( $\mathcal{P}_{T,\text{LQG}}^{\infty,\text{MF}}$ ). Assume $\{x_{0}^{i}\}_{i\in\mathbb{N}}$ are exchangeable random vectors with zero mean Gaussian distribution, and $w^{i}_{t}$ s are i.i.d. zero mean Gaussian random vectors for $i\in\mathbb{N}$ , independent for $t=0,\dots,T-1$ , and independent of initial states. If $L_{t}^{(N),(T)}$ in (54) converges pointwise as $N\to\infty$ to $L_{t}^{(\infty),(T)}$ , then

[TABLE]

where $K_{t}^{(T)}$ and $P_{t}^{(T)}$ satisfy (47) and (48), and $\Sigma=E[x_{0}^{1}(x_{0}^{2})^{T}](E[x_{0}^{2}(x_{0}^{2})^{T}])^{-1}$ .

Proof.

Following from [1, page 9], since $\{x_{0}^{i}\}_{i\in\mathbb{N}}$ are exchangeable Gaussian random vectors, we can describe them explicitly as $x_{0}^{i}=\omega_{0}+\theta^{i}$ where $(\theta^{1},\theta^{2},\dots)$ are i.i.d. mean zero Gaussian and independent of mean zero Gaussian random vector $\omega_{0}$ . Now, we invoke Theorem 10 (or Theorem 11) and Corollary 17 and we use Remark 5 to complete the proof. ∎

4.2 Average cost infinite horizons problems for partially nested dynamic teams

In the following, we consider average cost problems with a symmetric partially nested information structure. We note that the optimality of linear policies for infinite horizon LQG problems is an open problem in its generality. In this subsection, we provide a positive result for a class of such problems.

Now, we consider an infinite horizon team problem and we use the result in Section 4.1.

Problem ( $\mathcal{P}_{\infty}^{\text{tree}}$ ):

Consider the expected cost function of $(\underline{\gamma}^{1},\underline{\gamma}^{2})$ as

[TABLE]

where the cost function is defined as (43) and $I_{t}^{i}=\{x_{[0:t]}^{i},u_{[0:t-1]}^{i}\}$ .

First, we introduce a lemma essential for Theorem 20.

Lemma 19.

Consider the sequence $\{a^{i}_{T}\}_{i=1}^{T}$ . Assume $\lim\limits_{T\rightarrow\infty}a_{T}^{i}=a$ for $i=0,\dots,T-1$ . If for every fixed $T\in\mathbb{N}$ , $a_{T}^{l}=a_{T+1}^{l+1}$ for all $l=0\dots,T-1$ , then $\lim\limits_{T\rightarrow\infty}\frac{1}{T}\sum_{i=1}^{T}a_{T}^{i}=a$ .

Proof.

We have

[TABLE]

where the second equality follows from $a_{T}^{l}=a_{T+1}^{l+1}$ and the last equality follows from the Cesáro mean argument. ∎

Theorem 20.

Consider average cost infinite horizon team problems defined as ( $\mathcal{P}_{\infty}^{tree}$ ). Assume $(A,B)$ are stabilizable and $(A,Q^{\frac{1}{2}})$ are detectable. Assume $x_{0}^{1}$ and $x_{0}^{2}$ are exchangeable with an identical zero mean Gaussian distribution and $w^{i}_{t}$ s are i.i.d. zero mean Gaussian random variables for $i=1,2$ and for all $t=0,\dots,T-1$ and independent of initial states. If $L_{t}^{(T)}$ in (49) converges pointwise to $L_{t}^{(\infty)}$ as $T\to\infty$ , then the pointwise limit of the sequence of optimal policies for ( $\mathcal{P}_{T}^{\text{tree}}$ ) is team optimal for ( $\mathcal{P}_{\infty}^{tree}$ ) and stabilizes the closed-loop system,

[TABLE]

where $K,P,L_{t}^{(\infty)}$ and $G_{t}^{(\infty)}$ are the pointwise limit of the ones for ( $\mathcal{P}_{T}^{tree}$ ) as $T\to\infty$ .

Proof.

We show $\limsup\limits_{T\rightarrow\infty}J_{T}(\underline{\gamma}_{T}^{*})=J(\underline{\gamma}_{\infty}^{*})$ and invoke [38, Theorem 5] or [29, Theorem 1] to complete the proof. From (50), we have

[TABLE]

where (56) is zero since $P_{0}^{(T)}$ converges to $P$ using Lemma 19 since $P_{t+1}^{(T+1)}=P_{t}^{(T)}$ . Expression (58) converges to zero since $L_{t}^{(T)}$ in (49) converges pointwise to $L_{t}^{(\infty)}$ as $T\to\infty$ , we have $\sum_{s=t+1}^{\infty}B^{T}(A^{T})^{s-t}PBL_{s}^{(\infty)}<\infty$ , and this implies that $\lim\limits_{s\rightarrow\infty}L_{s}^{(\infty)}=0$ . Hence, we have for every $\epsilon>0$ , there exists $\hat{N}>T$ such that for every $t>\hat{N}$ , $|Tr[L_{t}^{(\infty)}(L_{t}^{(\infty)})^{T}]|<\epsilon$ . We define $L_{t}^{(T)}=0$ for $t>T$ . Expression (57) is equal to zero following from Lemma 19 and the fact that $|Tr[L_{t}^{(\infty)}(L_{t}^{(\infty)})^{T}]|<\epsilon$ for every $t>\hat{N}$ . Hence, equality (59) is true and global optimality follows from [38, Theorem 5]. The closed loop system is stable since $\limsup\limits_{t\rightarrow\infty}E(||x_{t}^{1}||^{2})<\infty$ following from $||A+BK||<1$ (all the eigenvalues of $A+BK$ are inside of the unit circle), and since $||L_{t}^{(\infty)}||$ is uniformly bounded. ∎

5 Conclusion

In this paper, we studied dynamic teams with symmetric information structures. We presented a characterization for symmetrically optimal teams for convex exchangeable team problems. For mean-field teams with symmetric information structures, we showed the convergence of optimal policies for mean-field teams with $N$ decision makers to the corresponding optimal policy of mean-field teams. We obtained globally optimal solutions for LQG dynamic team problems with symmetric partially nested information structures. Moreover, we obtained globally optimal policies for average cost infinite horizon problems of LQG dynamic teams.

In this paper since we worked under the convexity assumption, the information structure does not allow for the mean-field coupling in the dynamics. In our recent work [35], we relaxed the convexity assumption to arrive at complementary existence and structural results.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. J. Aldous, I. A. Ibragimov, and J. Jacod. Ecole d’Ete de Probabilites de Saint-Flour XIII, 1983 , volume 1117. Springer, 1985.
2[2] J. Arabneydi and A. Aghdam. A certainty equivalence result in team-optimal control of mean-field coupled markov chains. In IEEE 56th Annual Conference on Decision and Control (CDC) , pages 3125–3130, 2017.
3[3] J. Arabneydi and A. Mahajan. Team-optimal solution of finite number of mean-field coupled LQG subsystems. In IEEE 54th Annual Conference on Decision and Control (CDC) , pages 5308–5313, 2015.
4[4] A. Arapostathis, A. Biswas, and J. Carroll. On solutions of mean field games with ergodic cost. Journal de Mathématiques Pures et Appliquées , 107(2):205–251, 2017.
5[5] M. Bardi and M. Fischer. On non-uniqueness and uniqueness of solutions in finite-horizon mean field games. ESAIM: Control, Optimisation and Calculus of Variations , 25:44, 2019.
6[6] M. Bardi and F. S. Priuli. Linear-quadratic N-person and mean-field games with ergodic cost. SIAM Journal on Control and Optimization , 52(5):3022–3052, 2014.
7[7] E. Bayraktar and X. Zhang. On non-uniqueness in mean field games. Proceedings of the American Mathematical Society , 148:4091–4106, 2020.
8[8] P. Billingsley. Convergence of Probability Measures . Wiley, New York, 1968.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Optimal Policies for Convex Symmetric Stochastic Dynamic Teams and their Mean-field Limit

Abstract

keywords:

1 Introduction and literature review

1.1 Preliminaries

2 Finite horizon convex dynamic team problems with a symmetric information structure

Definition 1**.**

Definition 2**.**

Assumption 2.1**.**

Definition 3**.**

Theorem 4**.**

2.1 Optimality of symmetric policies for convex dynamic teams with a symmetric information structure

Definition 5**.**

Definition 6**.**

Remark 1**.**

Theorem 7**.**

Proof.

Lemma 8**.**

Proof.

3 Convex mean-field teams with a symmetric information structure

Assumption 3.1**.**

3.1 Mean-field optimal policies as limits of optimal NNN-DM teams

Assumption 3.2**.**

Assumption 3.3**.**

Lemma 9**.**

Proof.

Theorem 10**.**

Proof.

Remark 2**.**

Remark 3**.**

Remark 4**.**

Remark 5**.**

Theorem 11**.**

Proof.

3.2 An existence theorem on globally optimal policies for dynamic mean-field team problems with a symmetric information structure

Assumption 3.4**.**

Theorem 12**.**

Proof.

Remark 6**.**

4 Symmetric LQG dynamic teams

4.1 Symmetric partially nested LQG dynamic teams on a graph

Corollary 13**.**

Proof.

Theorem 14**.**

Proof.

Remark 7**.**

Corollary 15**.**

Proof.

Theorem 16**.**

Proof.

Remark 8**.**

Corollary 17**.**

Proof.

Corollary 18**.**

Proof.

4.2 Average cost infinite horizons problems for partially nested dynamic teams

Lemma 19**.**

Proof.

Theorem 20**.**

Proof.

5 Conclusion

Definition 1.

Definition 2.

Assumption 2.1.

Definition 3.

Theorem 4.

Definition 5.

Definition 6.

Remark 1.

Theorem 7.

Lemma 8.

Assumption 3.1.

3.1 Mean-field optimal policies as limits of optimal $N$ -DM teams

Assumption 3.2.

Assumption 3.3.

Lemma 9.

Theorem 10.

Remark 2.

Remark 3.

Remark 4.

Remark 5.

Theorem 11.

Assumption 3.4.

Theorem 12.

Remark 6.

Corollary 13.

Theorem 14.

Remark 7.

Corollary 15.

Theorem 16.

Remark 8.

Corollary 17.

Corollary 18.

Lemma 19.

Theorem 20.