Mean Field Linear Quadratic Control: FBSDE and Riccati Equation   Approaches

Bingchang Wang; and Huanshui Zhang

arXiv:1904.07522·math.OC·April 17, 2019

Mean Field Linear Quadratic Control: FBSDE and Riccati Equation Approaches

Bingchang Wang, and Huanshui Zhang

PDF

Open Access

TL;DR

This paper develops a comprehensive framework for mean field linear quadratic control problems, deriving decentralized control laws via FBSDE and Riccati equations, and establishing their social optimality and Nash equilibrium properties.

Contribution

It introduces a novel approach combining FBSDE and Riccati equations to design decentralized controls for mean field LQ control and game problems, linking open-loop and feedback solutions.

Findings

01

Decentralized controls are asymptotically social optimal.

02

Decentralized controls form an asymptotic Nash equilibrium.

03

Proposed controls are equivalent to previous feedback strategies.

Abstract

This paper studies social optima and Nash games for mean field linear quadratic control systems, where subsystems are coupled via dynamics and individual costs. For the social control problem, we first obtain a set of forward-backward stochastic differential equations (FBSDE) from variational analysis, and construct a feedback-type control by decoupling the FBSDE. By using solutions of two Riccati equations, we design a set of decentralized control laws, which is further proved to be asymptotically social optimal. Two equivalent conditions are given for uniform stabilization of the systems in different cases. For the game problem, we first design a set of decentralized control from variational analysis, and then show that such set of decentralized control constitute an asymptotic Nash equilibrium by exploiting the stabilizing solution of a nonsymmetric Riccati equation. It is verified…

Equations384

d x_{i} (t) =

d x_{i} (t) =

\displaystyle J_{i}(u)=\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\Big{\{}

\displaystyle J_{i}(u)=\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\Big{\{}

\displaystyle{\cal U}_{d,i}=\Big{\{}u_{i}\ \big{|}

\displaystyle{\cal U}_{d,i}=\Big{\{}u_{i}\ \big{|}

\displaystyle{\cal U}_{c,i}=\Big{\{}u_{i}\big{|}

\displaystyle{\cal U}_{c,i}=\Big{\{}u_{i}\big{|}

J_{soc} = i = 1 \sum N J_{i} (u) .

J_{soc} = i = 1 \sum N J_{i} (u) .

(P1) u \in L_{F_{t}}^{2} (0, T; R^{r}) in f J_{soc}^{F} (u),

(P1) u \in L_{F_{t}}^{2} (0, T; R^{r}) in f J_{soc}^{F} (u),

\displaystyle J_{i}^{\rm F}(u)=\mathbb{E}\int_{0}^{T}e^{-\rho t}\Big{\{}

\displaystyle J_{i}^{\rm F}(u)=\mathbb{E}\int_{0}^{T}e^{-\rho t}\Big{\{}

\sum_{i=1}^{N}\mathbb{E}\int_{0}^{T}e^{-\rho t}\Big{\{}\big{\|}y_{i}-\Gamma y^{(N)}\big{\|}^{2}_{Q}+\|u_{i}\|^{2}_{R}\Big{\}}dt\geq 0,

\sum_{i=1}^{N}\mathbb{E}\int_{0}^{T}e^{-\rho t}\Big{\{}\big{\|}y_{i}-\Gamma y^{(N)}\big{\|}^{2}_{Q}+\|u_{i}\|^{2}_{R}\Big{\}}dt\geq 0,

d y_{i} = [A y_{i} + G y^{(N)} + B u_{i}] d t, y_{i} (0) = 0, i = 1, 2, \dots, N .

d y_{i} = [A y_{i} + G y^{(N)} + B u_{i}] d t, y_{i} (0) = 0, i = 1, 2, \dots, N .

λ_{1} J_{soc}^{F} (v) + λ_{2} J_{soc}^{F} (\overset{v}{ˊ}) - J_{soc}^{F} (λ_{1} v + λ_{2} \overset{v}{ˊ})

λ_{1} J_{soc}^{F} (v) + λ_{2} J_{soc}^{F} (\overset{v}{ˊ}) - J_{soc}^{F} (λ_{1} v + λ_{2} \overset{v}{ˊ})

=

\left\{\begin{aligned} dx_{i}=&\big{(}Ax_{i}-B{R^{-1}}B^{T}p_{i}+Gx^{(N)}+f\big{)}dt+\sigma dW_{i},\\ dp_{i}=&-\big{[}(A-\rho I)^{T}p_{i}+G^{T}p^{(N)}\big{]}dt-\big{(}Qx_{i}-Q_{\Gamma}x^{(N)}-\bar{\eta}\big{)}dt+\sum_{j=1}^{N}\beta_{i}^{j}dW_{j},\\ x_{i}(0)&={x_{i0}},\quad p_{i}(T)=0,\quad i=1,\cdots,N,\end{aligned}\right.

\left\{\begin{aligned} dx_{i}=&\big{(}Ax_{i}-B{R^{-1}}B^{T}p_{i}+Gx^{(N)}+f\big{)}dt+\sigma dW_{i},\\ dp_{i}=&-\big{[}(A-\rho I)^{T}p_{i}+G^{T}p^{(N)}\big{]}dt-\big{(}Qx_{i}-Q_{\Gamma}x^{(N)}-\bar{\eta}\big{)}dt+\sum_{j=1}^{N}\beta_{i}^{j}dW_{j},\\ x_{i}(0)&={x_{i0}},\quad p_{i}(T)=0,\quad i=1,\cdots,N,\end{aligned}\right.

d p_{i} = α_{i} d t + β_{i}^{i} d W_{i} + j \neq = i \sum β_{i}^{j} d W_{j}, p_{i} (T) = 0, i = 1, \dots, N,

d p_{i} = α_{i} d t + β_{i}^{i} d W_{i} + j \neq = i \sum β_{i}^{j} d W_{j}, p_{i} (T) = 0, i = 1, \dots, N,

\displaystyle dx_{i}^{\theta}=\big{(}Ax_{i}^{\theta}+B(\check{u}_{i}+\theta u_{i})+f+\frac{G}{N}\sum_{i=1}^{N}x^{\theta}_{i}\big{)}dt+\sigma dW_{i},

\displaystyle dx_{i}^{\theta}=\big{(}Ax_{i}^{\theta}+B(\check{u}_{i}+\theta u_{i})+f+\frac{G}{N}\sum_{i=1}^{N}x^{\theta}_{i}\big{)}dt+\sigma dW_{i},

x_{i}^{θ} (0) = x_{i 0}, i = 1, 2, \dots, N .

\displaystyle 0=\

\displaystyle 0=\

\displaystyle=\

0 = i = 1 \sum N E \int_{0}^{T}

0 = i = 1 \sum N E \int_{0}^{T}

= i = 1 \sum N E \int_{0}^{T}

\displaystyle+\mathbb{E}\int_{0}^{T}e^{-\rho t}\big{\langle}\sum_{i=1}^{N}p_{i},\frac{G}{N}\sum_{i=1}^{N}y_{i}\big{\rangle}\Big{]}dt

= i = 1 \sum N E \int_{0}^{T}

\overset{ˇ}{J}_{soc}^{F} (\overset{u}{ˇ} + θ u) - \overset{ˇ}{J}_{soc}^{F} (\overset{u}{ˇ}) = 2 θ I_{1} + θ^{2} I_{2}

\overset{ˇ}{J}_{soc}^{F} (\overset{u}{ˇ} + θ u) - \overset{ˇ}{J}_{soc}^{F} (\overset{u}{ˇ}) = 2 θ I_{1} + θ^{2} I_{2}

I_{1} = Δ

I_{1} = Δ

I_{2} = Δ

\displaystyle\sum_{i=1}^{N}\mathbb{E}\int_{0}^{T}e^{-\rho t}\big{\langle}Q\big{(}\check{x}_{i}-(\Gamma\check{x}^{(N)}+\eta)\big{)},\Gamma y^{(N)}\big{\rangle}dt

\displaystyle\sum_{i=1}^{N}\mathbb{E}\int_{0}^{T}e^{-\rho t}\big{\langle}Q\big{(}\check{x}_{i}-(\Gamma\check{x}^{(N)}+\eta)\big{)},\Gamma y^{(N)}\big{\rangle}dt

=

=

=

I_{1} =

I_{1} =

\displaystyle+\sum_{i=1}^{N}\mathbb{E}\int_{0}^{T}e^{-\rho t}\big{[}\langle\alpha_{i}+(A-\rho I)^{T}p_{i}+G^{T}p^{(N)},y_{i}\rangle

=

\displaystyle+\sum_{i=1}^{N}\mathbb{E}\int_{0}^{T}e^{-\rho t}\Big{\langle}Q\big{(}\check{x}_{i}-(\Gamma\check{x}^{(N)}+\eta)\big{)}-{\Gamma^{T}Q}\big{(}(I-\Gamma)\check{x}^{(N)}-\eta\big{)}

\displaystyle\quad+\alpha_{i}+(A-\rho I)^{T}p_{i}+G^{T}p^{(N)},y_{i}\Big{\rangle}dt.

α_{i} =

α_{i} =

\overset{u}{ˇ}_{i} =

⎩ ⎨ ⎧ d \overset{x}{ˇ}_{i} = d \overset{p}{ˇ}_{i} = \overset{x}{ˇ}_{i} (0) (A \overset{x}{ˇ}_{i} - B R^{- 1} B^{T} \overset{p}{ˇ}_{i} + G \overset{x}{ˇ}^{(N)} + f) d t + σ d W_{i}, - [(A - ρ I)^{T} \overset{p}{ˇ}_{i} + G^{T} \overset{p}{ˇ}^{(N)} + Q \overset{x}{ˇ}_{i} - Q_{Γ} \overset{x}{ˇ}^{(N)} + \overset{η}{ˉ})] t + j = 1 \sum N β_{i}^{j} d W_{j}, = x_{i 0}, \overset{p}{ˇ}_{i} (T) = 0, i = 1, \dots, N,

⎩ ⎨ ⎧ d \overset{x}{ˇ}_{i} = d \overset{p}{ˇ}_{i} = \overset{x}{ˇ}_{i} (0) (A \overset{x}{ˇ}_{i} - B R^{- 1} B^{T} \overset{p}{ˇ}_{i} + G \overset{x}{ˇ}^{(N)} + f) d t + σ d W_{i}, - [(A - ρ I)^{T} \overset{p}{ˇ}_{i} + G^{T} \overset{p}{ˇ}^{(N)} + Q \overset{x}{ˇ}_{i} - Q_{Γ} \overset{x}{ˇ}^{(N)} + \overset{η}{ˉ})] t + j = 1 \sum N β_{i}^{j} d W_{j}, = x_{i 0}, \overset{p}{ˇ}_{i} (T) = 0, i = 1, \dots, N,

\left\{\begin{aligned} d{x}^{(N)}=&\Big{[}(A+G){x}^{(N)}-B{R^{-1}}B^{T}{p}^{(N)}+f\Big{]}dt+\frac{1}{N}\sum_{i=1}^{N}\sigma dW_{i},\\ d{p}^{(N)}=&-\Big{[}(A+G-\rho I)^{T}{p}^{(N)}-(I-\Gamma)^{T}Q(I-\Gamma){x}^{(N)}+\bar{\eta}\Big{]}dt+\frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{N}{\beta}_{i}^{j}dW_{j},\\ {x}^{(N)}(0)&=\frac{1}{N}\sum_{i=1}^{N}x_{i0},\quad{p}^{(N)}(T)=0.\end{aligned}\right.

\left\{\begin{aligned} d{x}^{(N)}=&\Big{[}(A+G){x}^{(N)}-B{R^{-1}}B^{T}{p}^{(N)}+f\Big{]}dt+\frac{1}{N}\sum_{i=1}^{N}\sigma dW_{i},\\ d{p}^{(N)}=&-\Big{[}(A+G-\rho I)^{T}{p}^{(N)}-(I-\Gamma)^{T}Q(I-\Gamma){x}^{(N)}+\bar{\eta}\Big{]}dt+\frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{N}{\beta}_{i}^{j}dW_{j},\\ {x}^{(N)}(0)&=\frac{1}{N}\sum_{i=1}^{N}x_{i0},\quad{p}^{(N)}(T)=0.\end{aligned}\right.

d p_{i} =

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsClimate Change Policy and Economics · Stochastic processes and financial applications · Economic theories and models

Full text

Mean Field Linear Quadratic Control: FBSDE and Riccati Equation Approaches

Bingchang Wang, Member, IEEE,

and Huanshui Zhang, Senior Member, IEEE This work was supported by the National Natural Science Foundation of China under Grants 61773241, 61573221 and 61633014.Bingchang Wang is with the School of Control Science and Engineering, Shandong University, Jinan 250061, P. R. China. (e-mail: [email protected]) Huanshui Zhang is with the School of Control Science and Engineering, Shandong University, Jinan 250061, P. R. China. (e-mail: [email protected])

Abstract

This paper studies social optima and Nash games for mean field linear quadratic control systems, where subsystems are coupled via dynamics and individual costs. For the social control problem, we first obtain a set of forward-backward stochastic differential equations (FBSDE) from variational analysis, and construct a feedback-type control by decoupling the FBSDE. By using solutions of two Riccati equations, we design a set of decentralized control laws, which is further proved to be asymptotically social optimal. Two equivalent conditions are given for uniform stabilization of the systems in different cases. For the game problem, we first design a set of decentralized control from variational analysis, and then show that such set of decentralized control constitute an asymptotic Nash equilibrium by exploiting the stabilizing solution of a nonsymmetric Riccati equation.

It is verified that the proposed decentralized control laws are equivalent to the feedback strategies of mean field control in previous works. This may illustrate the relationship between open-loop and feedback solutions of mean field control (games).

Index Terms:

Mean field game, variational analysis, social optimality, forward-backward stochastic differential equation, Riccati equation

I Introduction

Mean field games have drawn increasing attention in many fields including system control, applied mathematics and economics [7, 8, 12]. The mean field game involves a very large population of small interacting players with the feature that while the influence of each one is negligible, the impact of the overall population is significant. By combining mean field approximations and individual’s best response, the dimensionality difficulty is overcome. Mean field games and control have found wide applications, including smart grids [27, 10], finance, economics [13, 9, 32], and social sciences [5], etc.

By now, mean field games have been intensively studied in the LQ (linear-quadratic) framework [18, 19, 25, 33, 6, 29]. Huang et al. developed the Nash certainty equivalence (NCE) based on the fixed-point method and designed an $\epsilon$ -Nash equilibrium for mean field LQ games with discount costs by the NCE approach [18, 19]. The NCE approach was then applied to the cases with long run average costs [25] and with Markov jump parameters [33], respectively. Bensoussan et al. employed the adjoint equation approach and the fixed-point theorem to obtain a sufficient condition for the unique existence of the equilibrium strategy over a finite horizon [6]. For other aspects of mean field games, readers are referred to [21, 23, 39, 11] for nonlinear mean field games, [37] for oblivious equilibrium in dynamic games, [17, 34, 35] for mean field games with major players, [16, 29] for robust mean field games.

Besides noncooperative games, social optima in mean field models have also attracted much interest. The social optimum control refers to that all the players cooperate to optimize the common social cost—the sum of individual costs, which is usually regarded as a type of team decision problem [30, 14]. Huang et al. considered social optima in mean field LQ control, and provided an asymptotic team-optimal solution [20]. Wang and Zhang [36] investigated a mean field social optimal problem where the Markov jump parameter appears as a common source of randomness. For further literature, see [22] for social optima in mixed games, [3] for team-optimal control with finite population and partial information.

Most previous results on mean field games and control were given by virtue of the fixed-point analysis. However, the fixed-point method is sometimes conservative, particularly for general systems. In this paper, we break away from the fixed-point method and solve the problem by tackling forward-backward stochastic differential equations (FBSDE). In recent years, some substantial progress for the optimal LQ control has been made by solving the FBSDE. See [40, 42, 43, 31] for details.

This paper investigates social optima and Nash games for linear quadratic mean field systems, where subsystems (agents) are coupled via dynamics and individual costs. For the finite-horizon social control problem, we first obtain a set of forward-backward stochastic differential equations (FBSDE) by examining the variation of the social cost, and give a centralized feedback-type control laws by decoupling the FBSDE. With mean field approximations, we design a set of decentralized control laws, which is further shown to have asymptotic social optimality. For the infinite-horizon case, we design a set of decentralized control laws by using solutions of two Riccati equations, which is shown to be asymptotically social optimal. Some equivalent conditions are further given for uniform stabilization of the multiagent systems when the state weight $Q$ is semi-positive definite or only symmetric. For the problem of mean field games, we first design a set of decentralized control by variational analysis, whose control gain satisfies a nonsymmetric Riccati equation. With the help of the stabilizing solution of the nonsymmetric Riccati equation, we show that the set of decentralized control laws is an asymptotic Nash equilibrium. It is verified that the proposed decentralized control laws are equivalent representation of the feedback strategies in previous works of mean field control and games. Finally, some numerical examples are given to illustrate the effectiveness of the proposed control laws.

The main contributions of the paper are summarized as follows.

(i) For the social control problem, we first obtain necessary and sufficient existence conditions of finite-horizon centralized optimal control by variational analysis, and then design a feedback-type decentralized control by tackling FBSDE with mean field approximations.

(ii) In the case $Q\geq 0$ , the necessary and sufficient conditions are given for uniform stabilization of the systems with the help of the system’s observability and detectability.

(iii) In the case that $Q$ is only symmetric, the necessary and sufficient conditions are given for uniform stabilization of the systems using the Hamiltonian matrices.

(iv) For the game problem, we show that the decentralized control laws constitute an $\varepsilon$ -Nash equilibrium by exploiting the stabilizing solution of a nonsymmetric Riccati equation.

(v) It is under nonconservative assumptions that we obtain the asymptotically optimal decentralized control, and such control laws are shown to be equivalent to the feedback strategies given by the fixed-point method in previous works [19, 20].

The organization of the paper is as follows. In Section II, the socially optimal control problem is investigated. We first construct asymptotically optimal decentralized control laws by tackling FBSDE for the finite-horizon case, then design asymptotically optimal control for the infinite-horizon case and further give two equivalent conditions of uniform stabilization for different cases. In Section III, we design a decentralized $\varepsilon$ -Nash equilibrium for the finite-horizon and infinite-horizon cases, respectively. The proposed decentralized control laws are compared with the feedback strategies of previous works in Section IV. In Section V, some numerical examples are given to show the effectiveness of the proposed control laws. Section VI concludes the paper.

The following notation will be used throughout this paper. $\|\cdot\|$ denotes the Euclidean vector norm or matrix spectral norm. For a vector $z$ and a matrix $Q$ , $\|z\|_{Q}^{2}=z^{T}Qz$ , and $Q>0$ ( $Q\geq 0$ ) means that $Q$ is positive definite (semi-positive definite). For two vectors $x,y$ , $\langle x,y\rangle=x^{T}y$ . $C([0,T],\mathbb{R}^{n})$ is the space of all $\mathbb{R}^{n}$ -valued continuous functions defined on $[0,T]$ , and $C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ is a subspace of $C([0,\infty),\mathbb{R}^{n})$ which is given by $\{f|\int_{0}^{\infty}e^{-\rho t}\|f(t)\|^{2}dt<\infty\}.$ $L^{2}_{\cal F}(0,T;\mathbb{R}^{k})$ is the space of all $\mathcal{F}$ -adapted $\mathbb{R}^{k}$ -valued processes $x(\cdot)$ such that $\mathbb{E}\int_{0}^{T}\|x(t)\|dt<\infty$ . For two sequences $\{a_{n},n=0,1,\cdots\}$ and $\{b_{n},n=0,1,\cdots\}$ , $a_{n}=O(b_{n})$ denotes that $\limsup_{n\to\infty}|{a_{n}}/{b_{n}}|\leq C$ , and $a_{n}=o(b_{n})$ denotes $\limsup_{n\to\infty}|{a_{n}}/{b_{n}}|=0$ . For convenience of presentation, we use $C,C_{1},C_{2},\cdots$ to denote generic positive constants, which may vary from place to place.

II Mean Field LQ Social Control

Consider a large population systems with $N$ agents. Agent $i$ evolves by the following stochastic differential equation:

[TABLE]

where $x_{i}\in\mathbb{R}^{n}$ and $u_{i}\in\mathbb{R}^{r}$ are the state and input of the $i$ th agent. $x^{(N)}(t)=\frac{1}{N}\sum_{j=1}^{N}x_{j}(t)$ , $f,\sigma\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ . $\{W_{i}(t),1\leq i\leq N\}$ are a sequence of independent $1$ -dimensional Brownian motions on a complete filtered probability space $(\Omega,\mathcal{F},\{\mathcal{F}_{t}\}_{0\leq t\leq T},\mathbb{P})$ . The cost function of agent $i$ is given by

[TABLE]

where $Q$ , $R$ are symmetric matrices with appropriate dimensions, and $R>0$ . Denote $u=\{u_{1},\ldots,u_{i},\ldots,u_{N}\}$ . The decentralized control set is given by

[TABLE]

For comparison, define the centralized control sets as

[TABLE]

and ${\cal U}_{c}=\big{\{}(u_{1},\cdots,u_{N})\big{|}\ u_{i}\ \hbox{is adapted to}\ {\cal U}_{c,i}$ }, where ${\mathcal{F}}_{t}=\sigma\{\bigcup_{i=1}^{N}{\mathcal{F}}_{t}^{i}\}$ and ${\mathcal{F}}_{t}^{i}=\sigma(x_{i}(0),\\ W_{i}(s),0\leq s\leq t),i=1,\cdots,N$ .

In this section, we mainly study the following problem.

(PS). Seek a set of decentralized control laws to optimize social cost for the system (1)-(2), i.e., $\inf_{u_{i}\in{\cal U}_{d,i}}J_{\rm soc},$ where

[TABLE]

Assume

A1) $x_{i}(0),i=1,...,N$ are mutually independent and have the same mathematical expectation. $x_{i}(0)=x_{i0}$ , $\mathbb{E}x_{i}(0)=\bar{x}_{0}$ , $i=1,\cdots,N$ . There exists a constant $C_{0}$ (independent of $N$ ) such that $\max_{1\leq i\leq N}\mathbb{E}\|x_{i}(0)\|^{2}<C_{0}$ . Furthermore, $\{x_{i}(0),i=1,...,N\}$ and $\{W_{i},i=1,...,N\}$ are independent of each other.

II-A The finite-horizon problem

For the convenience of design, we first consider the following finite-horizon problem.

[TABLE]

where $J_{\rm soc}^{\rm F}(u)=\sum_{i=1}^{N}J_{i}^{\rm F}(u)$ and

[TABLE]

We first give an equivalent condition for the convexity of Problem (P1).

Proposition II.1

Problem (P1) is convex in $u$ if and only if for any $u_{i}\in L^{2}_{{\cal F}_{t}}(0,T;\mathbb{R}^{r})$ , $i=1,\cdots,N$ ,

[TABLE]

where $y^{(N)}=\sum_{j=1}^{N}y_{j}/N$ and $y_{i}$ satisfies

[TABLE]

Proof. Let $x_{i}$ and $\acute{x}_{i}$ be the state processes of agent $i$ with the control $v$ and $\acute{v}$ , respectively. Take any $\lambda_{1}\in[0,1]$ and let $\lambda_{2}=1-\lambda_{1}$ . Then

[TABLE]

Denote $u=v-\acute{v}$ , and $y_{i}=x_{i}-\acute{x}_{i}$ . Thus, $y_{i}$ satisfies (4). By the definition of the convexity, the lemma follows. $\hfill\Box$

By examining the variation of ${J}_{\rm soc}^{\rm F}$ , we obtain the necessary and sufficient conditions for the existence of centralized optimal control of (P1).

Theorem II.1

Suppose $R>0$ . Then (P1) has a set of optimal control laws if and only if Problem (P1) is convex in $u$ and the following equation system admits a set of solutions $(x_{i},p_{i},\beta_{i}^{j},i,j=1,\cdots,N)$ :

[TABLE]

where $Q_{\Gamma}\stackrel{{\scriptstyle\Delta}}{{=}}\Gamma^{T}Q+Q\Gamma-\Gamma^{T}Q\Gamma$ , $\bar{\eta}\stackrel{{\scriptstyle\Delta}}{{=}}Q\eta-\Gamma^{T}Q\eta$ , $p^{(N)}=\frac{1}{N}\sum_{i=1}^{N}p_{i}$ , and furthermore the optimal control is given by $\check{u}_{i}=-{R^{-1}}B^{T}p_{i}$ .

Proof. Suppose that $\check{u}_{i}=-R^{-1}B^{T}p_{i},$ where $p_{i},i=1,\cdots,N$ are a set of solutions to the equation system

[TABLE]

where $\alpha_{i}$ , $i=1,\cdots,N$ are to be determined. Denote by $\check{x}_{i}$ the state of agent $i$ under the control $\check{u}_{i}$ . For any $u_{i}\in L^{2}_{{\cal F}_{t}}(0,T;\mathbb{R}^{r})$ and $\theta\in\mathbb{R}$ , let $u_{i}^{\theta}=\check{u}_{i}+\theta u_{i}$ . Denote by $x_{i}^{\theta}$ the solution of the following perturbed state equationㄩ

[TABLE]

Let $y_{i}=(x_{i}^{\theta}-\check{x}_{i})/\theta$ . It can be verified that $y_{i}$ satisfies (4). Then by Itô’s formula, for any $i=1,\cdots,N$ ,

[TABLE]

which implies

[TABLE]

From (3), we have

[TABLE]

where $\check{u}=(\check{u}_{1},\cdots,\check{u}_{N})$ , and

[TABLE]

Note that

[TABLE]

From (7), one can obtain that

[TABLE]

From (11), $\check{u}$ is a minimizer to Problem (P1) if and only if $I_{2}\geq 0$ and $I_{1}=0$ . By Proposition II.1, $I_{2}\geq 0$ if and only if (P1) is convex. $I_{1}=0$ is equivalent to

[TABLE]

Thus, we have the following optimality system:

[TABLE]

such that $\check{u}_{i}=-{R^{-1}}B^{T}\check{p}_{i}$ . This implies that the equation systems (5) admits a solution

$(\check{x}_{i},\check{p}_{i},\check{\beta}_{i}^{j},i,j=1,\cdots,N)$ .

On other hand, if the equation system (5) admits a solution $(\check{x}_{i},\check{p}_{i},\check{\beta}_{i}^{j},i,j=1,\cdots,N)$ . Let $\check{u}_{i}=-R^{-1}B^{T}\check{p}_{i}$ . If (P1) is convex, then $\check{u}$ is a minimizer to Problem (P1).

$\hfill\Box$

It follows from (5) that

[TABLE]

Let $p_{i}=Px_{i}+Kx^{(N)}+s$ . Then by (5), (18) and Itô’s formula,

[TABLE]

This implies that $\beta_{i}^{i}=\frac{1}{N}K\sigma+P\sigma$ , $\beta_{i}^{j}=\frac{1}{N}K\sigma,\ j\not=i$ ,

[TABLE]

Then $\check{u}_{i}=-{R^{-1}}B^{T}(Px_{i}+Kx^{(N)}+s).$

Theorem II.2

Assume that A1) holds and $Q\geq 0$ . Then Problem (P1) has an optimal control

[TABLE]

where $P,K$ and $s$ are determined by (19)-(22).

Proof. Denote $\Pi=P+K$ . Then from (20) and (22), $\Pi$ satisfies

[TABLE]

where $\hat{Q}\stackrel{{\scriptstyle\Delta}}{{=}}(I-\Gamma)^{T}Q(I-\Gamma)$ . Note that $Q\geq 0$ and $R>0$ . By [2, 41], (19) and (23) admit unique solutions $P\geq 0$ and $\Pi\geq 0$ , respectively, which implies that (20) and (22) have unique solutions $K$ and $s$ , respectively. Then by [26, 42], the FBSDE (5) admits a unique solution. By Theorem II.1, Problem (P1) has an optimal control given by $\check{u}_{i}=-{R^{-1}}B^{T}(Px_{i}+Kx^{(N)}+s),$ where $P,K$ and $s$ are determined by (19)-(22).

$\Box$

As an approximation to ${x}^{(N)}$ in (18), we obtain

[TABLE]

Then, by Theorem II.2, the decentralized control law for agent $i$ may be taken as

[TABLE]

where $P,K$ , and $s$ are determined by (19)-(22), and $\bar{x}$ and $\hat{x}_{i}$ satisfy (24) and

[TABLE]

Remark II.1

In previous works [20, 36], the mean field term $x^{(N)}$ in cost functions (dynamics) is first substituted by a deterministic function $\bar{x}$ . By solving an optimal tracking problem subject to consistency requirements, a fixed-point equation is obtained. The decentralized control is constructed by handling the fixed-point equation. Here, we firstly obtain the centralized open-loop solution by variational analysis. By tackling the coupled FBSDEs combined with mean field approximations, the decentralized control laws are designed. Note that in this case $s$ and $\bar{x}$ are fully decoupled and no fixed-point equation is needed.

Theorem II.3

Let A1) hold and $Q\geq 0$ . The set of decentralized control laws $\{\hat{u}_{1},\cdots,\hat{u}_{N}\}$ given by (25) has asymptotic social optimality, i.e.,

[TABLE]

Proof. See Appendix A. $\hfill\Box$

II-B The infinite-horizon problem

Based on the analysis in Section II-A, we may design the following decentralized control laws for Problem (PS):

[TABLE]

where $P$ and $\Pi$ are determined by

[TABLE]

and $s,\bar{x}\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ are determined by

[TABLE]

Here the existence conditions of $P,\Pi,s$ and $\bar{x}$ need to be investigated further.

We introduce some assumptions:

A2) The system $(A-\frac{\rho}{2}I,B)$ is stabilizable, and $(A+G-\frac{\rho}{2}I,B)$ is stabilizable.

A3) $Q\geq 0$ , $(A-\frac{\rho}{2}I,\sqrt{Q}$ ) is observable, and $(A+G-\frac{\rho}{2}I,\sqrt{Q}(I-\Gamma))$ is observable.

Assumptions A2) and A3) are basic in the study of the LQ optimal control problem. We will show that under some conditions, A2) is also necessary for uniform stabilization of multiagent systems. In many cases, A3) may be weakened to the following assumption.

A3′)** $Q\geq 0$ , $(A-\frac{\rho}{2}I,\sqrt{Q}$ ) is detectable, and $(A+G-\frac{\rho}{2}I,\sqrt{Q}(I-\Gamma))$ is detectable.

Lemma II.1

Under A2)-A3), (28) and (29) admit unique solutions $P>0,\Pi>0$ , respectively, and (30)-(31) admits a set of unique solutions $s,\bar{x}\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ .

Proof. From A2)-A3) and [2], (28) and (29) admit unique solutions $P>0,\Pi>0$ such that $A-BR^{-1}B^{T}P-\frac{\rho}{2}I$ and $A+G-BR^{-1}B^{T}\Pi-\frac{\rho}{2}I$ are Hurwitz, respectively. From an argument in [34, Appendix A], we obtain $s\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ if and only if

[TABLE]

Under this initial condition, we have

[TABLE]

It is straightforward that $\bar{x}\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ . $\hfill\Box$

We further introduce the following assumption.

A4) $\bar{A}+G-\frac{\rho}{2}I$ is Hurwitz, where $\bar{A}\stackrel{{\scriptstyle\Delta}}{{=}}A-BR^{-1}B^{T}P$ .

Lemma II.2

Let A1)-A4) hold. Then for (PS),

[TABLE]

Proof. See Appendix B. $\hfill\Box$

It is shown that the decentralized control laws (25) uniformly stabilize the systems (1) .

Theorem II.4

Let A1)-A4) hold. Then for any $N$ ,

[TABLE]

Proof. See Appendix B. $\hfill\Box$

We now give two equivalent conditions for uniform stabilization of multiagent systems.

Theorem II.5

Let A3) hold. Then for (PS) the following statements are equivalent:

(i) For any initial condition $(\hat{x}_{1}(0),\cdots,\hat{x}_{N}(0))$ satisfying A1),

[TABLE]

(ii) (28) and (29) admit unique solutions $P>0,\Pi>0$ , respectively, and $\bar{A}+G-\frac{\rho}{2}I$ is Hurwitz.

(iii) A2) and A4) hold.

Proof. See the Appendix C. $\hfill\Box$

For the case $G=0$ , we have a simplified version of Theorem II.5.

Corollary II.1

Assume that A3) holds and $G=0$ . Then for (PS) the following statements are equivalent:

(i) For any $(\hat{x}_{1}(0),\cdots,\hat{x}_{N}(0))$ satisfying A1),

[TABLE]

(ii) (28) and (29) admit unique solutions $P>0,\Pi>0$ , respectively.

(iii) A2) holds.

When A3) is weakened to A3*′*), we have the following equivalent conditions of uniform stabilization of the systems.

Theorem II.6

Let A3′) hold. Then for (PS) the following statements are equivalent:

(i) For any initial condition $(\hat{x}_{1}(0),\cdots,\hat{x}_{N}(0))$ satisfying A1),

[TABLE]

(ii) (28) and (29) admit unique solutions $P\geq 0,\Pi\geq 0$ , respectively, and $\bar{A}+G-\frac{\rho}{2}I$ is Hurwitz.

(iii) A2) and A4) hold.

Proof. See the Appendix C. $\hfill\Box$

Remark II.2

In [43], some similar results were given for the stabilization of mean field systems. However, only the limiting problem was considered in their work and the mean field term in dynamics and costs is $\mathbb{E}x(t)$ instead of $x^{(N)}$ . Here we study large-population multiagent systems and the number of agents is large but not infinite. The errors of mean field approximations are further analyzed. To obtain asymptotic optimality, an additional assumption A4) is needed later.

For the more general case that $Q$ are only symmetric, we have the following equivalent conditions for uniform stabilization of multiagent systems.

Denote

[TABLE]

Theorem II.7

Assume that both $M_{1}$ and $M_{2}$ have no eigenvalues on the imaginary axis. Then for (PS) the following statements are equivalent:

(i) For any $(x_{1}(0),\cdots,x_{N}(0))$ satisfying A1),

[TABLE]

(ii) (28) and (29) admit $\rho$ -stabilizing solutions111For a Riccati equation (28), $P$ is called a $\rho$ -stabilizing solution if $P$ satisfies (28) and all the eigenvalues of $A-BR^{-1}B^{T}P$ are in left half-plane., respectively, and $\bar{A}+G-\frac{\rho}{2}I$ is Hurwitz.

(iii) A2) and A4) hold.

Remark II.3

$M_{1}$ * and $M_{2}$ are Hamiltonian matrices. The Hamiltonian matrix plays a significant role in studying general algebraic Riccati equations. See more details of the property of Hamiltonian matrices in [1, 28].*

To show Theorem II.7, we need two lemmas. The first lemma is a result from [28, Theorem 6].

Lemma II.3

Equations (28) and (29) admit $\rho$ -stabilizing solutions if and only if A2) holds and both $M_{1}$ and $M_{2}$ have no eigenvalues on the imaginary axis.

Lemma II.4

Let A1) hold. Assume that (28) and (29) admit $\rho$ -stabilizing solutions, respectively, and $\bar{A}+G-\frac{\rho}{2}I$ is Hurwitz. Then

[TABLE]

Proof. From the definition of $\rho$ -stabilizing solutions, $A-BR^{-1}B^{T}P-\frac{\rho}{2}I$ and $A+G-BR^{-1}B^{T}\Pi-\frac{\rho}{2}I$ are Hurwitz. By the argument in the proof of Theorem II.4, the lemma follows. $\hfill\Box$

The Proof of Theorem II.7. By using Lemmas II.3 and II.4 together with a similar argument in the proof of Theorem II.4, the Theorem follows. $\hfill\Box$

Example II.1

Consider a scalar system with $A=a$ , $B=b$ , $G=g$ , $Q=q$ , $\Gamma=\gamma$ , $R=r>0$ . Then

[TABLE]

By direct computations, neither $M_{1}$ nor $M_{2}$ has eigenvalues in imaginary axis if and only if

[TABLE]

Note that if $q>0$ (or $a-{\rho}/{2}<0$ , $q=0$ ), i.e., $(a-{\rho}/{2},\sqrt{q})$ is observable (detectable), then (35) holds, and if $(1-\gamma)^{2}q>0$ ( $a+g-{\rho}/{2}<0,$ $q=0$ ), i.e., $(a+g-{\rho}/{2},\sqrt{q}(1-\gamma))$ is observable (detectable), then (36) holds.

For this model, the Riccati equation (28) is written as

[TABLE]

Let $\Delta=4[(a-{\rho}/{2})^{2}+{b^{2}q}/{r}]$ . If (35) holds then $\Delta>0$ , which implies (37) admits two solutions. If $q>0$ then (37) has a unique positive solution such that $a-b^{2}p/r-{\rho}/{2}=-\sqrt{\Delta}/2<0$ . If $q=0$ and $a-\rho/2<0$ then (37) has a unique non-negative solution $p=0$ such that $a-b^{2}p/r-{\rho}/{2}=a-{\rho}/2<0$ .

Assume that (35) and (36) hold. By Theorem II.7, the system is uniformly stable if and only if $(a-\rho/2,b)$ is stabilizable (i.e., $b\not=0$ or $a-\rho/2<0$ ), and $a-b^{2}p/r-{\rho}/{2}+g<0$ . Note that $a-b^{2}p/r-{\rho}/{2}<0$ . When $g\leq 0$ , we have $a-b^{2}p/r-{\rho}/{2}+g<0$ .

Example II.2

We further consider the model in Example II.1 for the case that $a+g=\rho/2$ and $\gamma=1$ (i.e., (36) does not hold). In this case, the Riccati equation (29) admits a unique solution $\Pi=0$ . (30) becomes $\rho s=\dot{s}+\frac{\rho}{2}s$ and has a unique solution $s=0$ in $C_{\rho/2}([0,\infty),\mathbb{R})$ . Thus, $\bar{x}$ satisfies

[TABLE]

Assume that $f$ is a constant. Then (38) does not admit a solution in $C_{\rho/2}([0,\infty),\mathbb{R})$ unless $\bar{x}(0)=-{2f}/{\rho}$ .

We are in a position to state the asymptotic optimality of the decentralized control.

Theorem II.8

Let A1)-A4) hold. For Problem (PS), the set of decentralized control laws $\{\hat{u}_{1},\cdots,\hat{u}_{N}\}$ given by (27) has asymptotic social optimality, i.e.,

[TABLE]

Proof. We first prove that for $u\in\mathcal{U}_{c}$ , $J_{\rm soc}(u)<\infty$ implies that

[TABLE]

for all $i=1,\cdots,N$ . From $J_{\rm soc}(u)<\infty$ , we have $\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\|u_{i}\|^{2}dt<\infty$ and

[TABLE]

which further implies that

[TABLE]

By (1) we have

[TABLE]

which leads to for any $r\in[0,1]$ ,

[TABLE]

By $J_{\rm soc}(u)<\infty$ and basic SDE estimates, we can find a constant $C$ such that

[TABLE]

From (41) and (42) we obtain

[TABLE]

which together with A3) implies that

[TABLE]

This and (40) lead to

[TABLE]

By (1), we have

[TABLE]

It follows from (43) that

[TABLE]

From (44) and (45), we obtain that

[TABLE]

This together with A3) implies that

[TABLE]

which gives (39). By Theorem II.4,

[TABLE]

By a similar argument to the proof of Theorem II.3 combined with Lemma II.2, the conclusion follows. $\hfill\Box$

If A3) is replaced by A3*′*), the decentralized control (27) still has asymptotic social optimality.

Corollary II.2

Assume that A1)-A2), A3′), A4) hold. The set of decentralized control laws given by (27) is asymptotically socially optimal.

Proof. Without loss of generality, we simply assume $A+G=\hbox{diag}\{\mathbb{A}_{1},\mathbb{A}_{2}\}$ , where $\mathbb{A}_{1}-(\rho/2)I$ is Hurwitz, and $-(\mathbb{A}_{2}-(\rho/2)I)$ is Hurwitz (If necessary, we may apply a nonsingular linear transformation as in the proof of Theorem II.6). Write $x^{(N)}=[z_{1}^{T},z_{2}^{T}]$ and ${\hat{Q}}^{1/2}=[S_{1},S_{2}]$ such that

[TABLE]

and $(\mathbb{A}_{2}-(\rho/2)I,S_{2})$ is observable which is due to the detectability of $(A+G-(\rho/2)I,\hat{Q}^{1/2})$ . By the proof of Theorem II.4 or [17], $\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\|u^{(N)}\|^{2}dt<\infty$ implies $\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\|z_{1}\|^{2}dt<\infty$ , which together with (41) gives $\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\|S_{2}z_{2}\|^{2}dt<\infty$ . This and the observability of $(A_{2}-(\rho/2)I,S_{2})$ leads to $\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\|z_{2}\|^{2}dt<\infty$ . Thus, $\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\|x^{(N)}\|^{2}dt<\infty$ . The other parts of the proof are similar to that of Theorem II.8. $\hfill\Box$

III Mean Field LQ Games

In this section, we investigate the game problem for LQ mean field systems.

(PG). Seek a set of decentralized control laws to minimize individual cost for each agent in the system (1)-(2).

III-A The finite-horizon problem

We first consider the finite-horizon problem. Suppose that $\bar{x}\in C([0,T],\mathbb{R}^{n})$ is given for approximation of $x^{(N)}$ . Replacing $x^{(N)}$ in (1) and (3) by $\bar{x}$ , we have the following auxiliary optimal control problem.

[TABLE]

where

[TABLE]

By examining the variation of $\bar{J}_{i}^{\rm F}$ , we obtain the unique optimal control of (P2).

Theorem III.1

Assume $Q\geq 0,R>0$ . Then the FBSDE

[TABLE]

admits a unique solution $(\grave{x}_{i},p_{i},q_{i})$ , and the optimal control $\hat{u}_{i}=-R^{-1}B^{T}p_{i}$ .

Proof. Since $Q\geq 0$ and $R>0$ , then by [41], (P2) is uniformly convex, and hence admits a unique optimal control. By a similar argument with Theorem II.1, the conclusion follows. $\hfill\Box$

It follows from (46) that

[TABLE]

Replacing $\grave{x}^{(N)}$ by $\bar{x}$ , we have

[TABLE]

Let $\bar{p}=\bar{P}\bar{x}+\hat{s}$ . By Itô’s formula, we obtain

[TABLE]

This implies

[TABLE]

Denote $\tilde{p}_{i}=p_{i}-\bar{p}$ , and $\tilde{x}_{i}=\grave{x}_{i}-\bar{x}$ . Then by (46) and (47) we have

[TABLE]

Let $\tilde{p}_{i}=P\tilde{x}_{i}$ . By Itô’s formula,

[TABLE]

which implies that $q_{i}=P\sigma$ , and

[TABLE]

Assume

A5) Equation (48) admits a solution in $C([0,T],\mathbb{R}^{n})$ .

By the local Lipschitz-continuous property of the quadratic function, (48) can admit a unique local solution in a small time duration $[T_{0},T]$ . It may be referred to [1] for some sufficient conditions of the existence of the solution in $[0,T]$ . We now provide a necessary and sufficient condition to guarantee the global solvability of (48).

Proposition III.1

(48) admits a solution in $C([0,T],\mathbb{R}^{n})$ if and only if for any $t\in[0,T],$

[TABLE]

where

[TABLE]

Proof. Sufficiency is given by [26, Theorem 4.3, p.48]. Necessity is implied from Proposition 4.2 and Theorem 3.2 of [26, Chapter 2]. $\hfill\Box$

Let

[TABLE]

where $P,\bar{P}$ and $\hat{s}$ are determined by (50), (48) and (49), respectively, and $\bar{x}$ and $\hat{x}_{i}$ satisfy

[TABLE]

Denote $u_{-i}=(u_{1},\cdots,u_{i-1},u_{i+1},\cdots,u_{N})$ .

Theorem III.2

Let A1), A5) hold and $Q\geq 0$ . The set of decentralized strategies $\{\hat{u}_{1},\cdots,\hat{u}_{N}\}$ given by (51) is an $\varepsilon$ -Nash equilibrium, i.e.,

[TABLE]

*where $\varepsilon=({1}/{\sqrt{N}})$ . *

Proof. See the Appendix D. $\hfill\Box$

III-B The infinite-horizon problem

For simplicity, we consider the case $G=0$ .

Based on the analysis in Section III-A, we may design the following decentralized control for (PG):

[TABLE]

where $P$ and $\bar{P}$ are determined by

[TABLE]

respectively, and $\hat{s},\bar{x}\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ are determined by

[TABLE]

and $\hat{x}_{i}$ satisfies

[TABLE]

Here the existence conditions of $P,\bar{P},s$ and $\bar{x}$ need to be investigated further.

We introduce the following assumptions.

A6) $(A-\frac{\rho}{2}I,B)$ is stabilizable, $Q\geq 0$ and $(A-\frac{\rho}{2}I,\sqrt{Q})$ is detectable.

A7) (58) admits a stabilizing solution.

Lemma III.1

Assume that $M_{3}$ has $n$ stable eigenvalues (with negative real parts) and $n$ unstable eigenvalues, where

[TABLE]

Suppose that

[TABLE]

where $H_{11}$ is Hurwitz and $L_{1}$ is invertible. Then A7) holds.

Proof. Let $\bar{P}=-L_{2}L_{1}^{-1}$ . It follows from (63) that

[TABLE]

By pre-multiplying by $[\bar{P}\ \ I]$ on both sides, we obtain

[TABLE]

which leads to (58). By (64), we have $A-BR^{-1}B^{T}\bar{P}-\frac{\rho}{2}I=L_{1}H_{11}L_{1}^{-1}$ is Hurwitz. It is straightforward that $s,\bar{x}\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ . $\hfill\Box$

Remark III.1

The above lemma provides a convenient method to compute the stabilizing solutions of algebraic Riccati equations. Assume there exists an invertible matrix $V=\left[\begin{array}[]{cc}V_{11}&V_{12}\\ V_{21}&V_{22}\end{array}\right]$ such that $V^{-1}M_{3}V=\left[\begin{array}[]{cc}H_{11}&H_{12}\\ 0&H_{22}\end{array}\right],$ where $V_{11}$ is invertible, and $H_{11},-H_{22}$ are Hurwitz. Then $V_{21}V_{11}^{-1}$ is the stabilizing solution of (58). $V$ comprises $2n$ independent vectors, which are called Schur vectors [24].

Lemma III.2

Assume that A1), A6), A7) hold. Then (59)-(60) admit a set of unique solutions $s,\bar{x}\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ , and

[TABLE]

Proof. By a similar argument in the proof of Theorem II.6, the lemma follows. $\hfill\Box$

Theorem III.3

Let A1), A6), A7) hold. For Problem (PG), the set of decentralized strategies $\{\hat{u}_{1},\cdots,\hat{u}_{N}\}$ given by (56) is an $\varepsilon$ -Nash equilibrium, i.e.,

[TABLE]

where $\varepsilon=({1}/{\sqrt{N}}).$

Proof. See Appendix D. $\hfill\Box$

IV Comparison of Different Solutions

In this section, we compare the proposed decentralized control laws with the feedback decentralized strategies in previous works [19, 20].

We first introduce a definition from [4].

Definition IV.1

For a control problem with an admissible control set $\mathcal{U}$ , a control law $u\in\mathcal{U}$ is said to be a representation of another control $u^{*}\in\mathcal{U}$ if

(i) they both generate the same unique state trajectory, and

(ii) they both have the same open-loop value on this trajectory.

For Problem (PS), let $f=0$ , and $G=0$ . In [20, Theorem 4.3], the decentralized control laws are given by

[TABLE]

where $P$ is the semi-positive definite solution of (57), and $\bar{s}=\bar{K}{x}^{{\dagger}}+\phi.$ Here $\bar{K}$ satisfies

[TABLE]

and ${x}^{{\dagger}},\phi\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ are determined by

[TABLE]

in which $\bar{A}=A-BR^{-1}B^{T}P$ . By comparing this with (29)-(31), one can obtain that $\bar{K}=\Pi-P$ , $\bar{x}=\bar{x}^{{\dagger}}$ and $\phi=s$ . From the above discussion, we have the equivalence of the two sets of decentralized control laws.

Proposition IV.1

The set of decentralized control laws $\{\hat{u}_{1},\cdots,\hat{u}_{N}\}$ in (27) is a representation of $\{\breve{u}_{1},\cdots,\breve{u}_{N}\}$ given by (65).

For Problem (PG), let $f=0$ , and $G=0$ . In [19], the decentralized strategies are given by

[TABLE]

where $P$ is the positive definite solution of (28), ${s}^{*}$ is determined by the fixed-point equation

[TABLE]

We now show the equivalence of the decentralized open-loop and feedback solutions to mean field games.

Proposition IV.2

The set of decentralized control laws $\{\hat{u}_{1},\cdots,\hat{u}_{N}\}$ in (56) is a representation of $\{{u}_{1}^{*},\cdots,{u}_{N}^{*}\}$ given by (66).

Proof. Let $s^{*}=K^{*}\bar{x}^{*}+\psi$ . From (67), we have

[TABLE]

which gives

[TABLE]

By comparing this with (57)-(59), one can obtain $K=\bar{P}-P$ , and $\psi=\hat{s}$ . Thus, we have $u_{i}^{*}\equiv\hat{u}_{i}$ , $i=1,\cdots,N,$ which implies that $\{{u}_{1}^{*},\cdots,{u}_{N}^{*}\}$ is a representation of $\{\hat{u}_{1},\cdots,\hat{u}_{N}\}$ in (56). $\hfill\Box$

V Numerical Examples

In this section, some numerical examples are given to illustrate the effectiveness of the proposed decentralized control laws.

We first consider a scalar system with $50$ agents in Problem (PS). Take $B=Q=R=1,G=-0.2,f(t)=1,\sigma(t)=0.1,\rho=0.6,\Gamma=-0.2,\eta=5$ in (1)-(2). The initial states of $50$ agents are taken independently from a normal distribution $N(5,0.5)$ . Then, under the control law (27), the state trajectories of agents for the cases with $A=0.2$ and $A=1$ are shown in Figs. 1 and 2, respectively. After the transient phase, the states of agents behave similarly and achieve agreement roughly.

Next, we simulate the scalar case of Problem (PG), where the parameters are the same as above, except $G=0$ . After the control laws (56) are applied, the state trajectories of 50 agents with $A=0.2$ and $A=1$ are shown in Figs. 3 and 4, respectively.

For the case $A=1$ and $G=0$ , the trajectories of $\bar{x}$ and $\hat{x}^{(N)}$ in Problems (PS) and (PG) are shown in Fig. 5. It can be seen that $\bar{x}$ and $\hat{x}^{(N)}$ coincide well, which illustrate the consistency of mean field approximations. Clearly, the state average of agents has significantly lower value in Problem (PS) than in (PG).

Finally, we consider the 2-dimensional case of Problem (PS). Take parameters as follows: $A=\left[\begin{array}[]{cc}0.1&0\\ -1&0.2\\ \end{array}\right]$ , $B=\left[\begin{array}[]{cc}1&0\\ 0&1\\ \end{array}\right]$ , $G=\left[\begin{array}[]{cc}-0.5&0\\ 0&-0.3\\ \end{array}\right]$ , $B=\left[\begin{array}[]{c}1\\ 1\\ \end{array}\right]$ , $Q=\left[\begin{array}[]{cc}1&0\\ 0&1\\ \end{array}\right]$ , $\Gamma=\left[\begin{array}[]{cc}1&0\\ 1&1\\ \end{array}\right]$ , $R=\left[\begin{array}[]{cc}1&0\\ 0&1\\ \end{array}\right]$ , $\eta=\left[\begin{array}[]{c}0\\ 0.5\\ \end{array}\right]$ , $f=[1\ \ 1]^{T}$ and $\sigma=[0.5\ \ 0.5]^{T}$ . Denote $\hat{x}_{i}(t)=[\hat{x}^{1}_{i}(t)\ \hat{x}^{2}_{i}(t)]^{T}$ . Both of $\hat{x}^{1}_{i}(0)$ and $\hat{x}^{2}_{i}(0)$ are taken independently from a normal distribution $N(5,0.5)$ . Under the control laws (27), the trajectories of $\hat{x}^{1}_{i}$ and $\hat{x}^{2}_{i}$ , $i=1,\cdots,N$ are shown in Figs. 6 and 7, respectively.

VI Concluding Remarks

In this paper, we have considered uniform stabilization and asymptotic optimality for mean field LQ multiagent systems. For social control and Nash game problems, we design the decentralized open-loop control laws by the variational analysis, respectively, which are further shown to be asymptotically optimal. Two equivalent conditions are further given for uniform stabilization of the systems in different cases. Finally, we show such decentralized control laws are equivalent to the feedback strategies in previous works.

An interesting generalization is to consider mean field LQ control systems with partial measurements by using variational analysis. Also, the variational analysis may be applied to general nonlinear model to construct decentralized control laws for social control and Nash games.

Appendix A Proof of Theorem II.3

To prove Theorem II.3, we need a lemma.

Lemma A.1

Let A1) hold and $Q\geq 0$ . Under the control (25), we have

[TABLE]

Proof. It follows by (26) that

[TABLE]

From this and (24), we have

[TABLE]

which leads to

[TABLE]

By A1), one can obtain

[TABLE]

which completes the proof. $\hfill\Box$

Proof of Theorem II.3. We first prove that for $u\in\mathcal{U}_{c}$ , $J_{\rm soc}^{\rm F}(u)<\infty$ implies that $\mathbb{E}\int_{0}^{T}e^{-\rho t}(\|x_{i}\|^{2}+\|u_{i}\|^{2})dt<\infty,$ for all $i=1,\cdots,N$ . By $J_{\rm soc}^{\rm F}(u)<\infty$ , we have $\mathbb{E}\int_{0}^{T}e^{-\rho t}\|u_{i}\|^{2}dt<\infty.$ This leads to

[TABLE]

where $u^{(N)}=\frac{1}{N}\sum_{i=1}^{N}u_{i}.$ By (1),

[TABLE]

which with A1) implies that

[TABLE]

Note that

[TABLE]

We have

[TABLE]

By (24) and (26), we obtain that

[TABLE]

Let $\tilde{x}_{i}=x_{i}-\hat{x}_{i}$ , $\tilde{u}_{i}=u_{i}-\hat{u}_{i}$ and $\tilde{x}^{(N)}=\frac{1}{N}\sum_{i=1}^{N}\tilde{x}_{i}$ . Then by (1) and (26),

[TABLE]

From (3), we have

[TABLE]

where

[TABLE]

By (A4), $\tilde{J}_{i}^{\rm F}(\tilde{u})\geq 0$ . We now prove $\frac{1}{N}\sum_{i=1}^{N}I_{i}=O(\frac{1}{\sqrt{N}})$ .

[TABLE]

By (19)-(22), (A.6) and Itô’s formula,

[TABLE]

From this and (A.8), we obtain

[TABLE]

By Lemma A.1, (A.4) and (A.5), we obtain

[TABLE]

which implies $|\frac{1}{N}\sum_{i=1}^{N}I_{i}|=O(1/\sqrt{N})$ . $\hfill\Box$

Appendix B Proofs of Lemma II.2 and Theorem II.4

Proof of Lemma II.2. From (A.2), we have

[TABLE]

Thus,

[TABLE]

$\hfill\Box$

Proof of Theorem II.4. By A1)-A4), Lemmas II.1 and II.2, we obtain that $\bar{x}\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ and

[TABLE]

which further gives that

[TABLE]

Denote $g\stackrel{{\scriptstyle\Delta}}{{=}}-BR^{-1}B^{T}((\Pi-P)\bar{x}+s)+Gx^{(N)}+f$ . Then $\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\|g(t)\|^{2}dt<\infty$ and

[TABLE]

Note that $\bar{A}-\frac{\rho}{2}I$ is Hurwitz. By Schwarz’s inequality,

[TABLE]

This with (27) completes the proof. $\Box$

Appendix C Proofs of Theorems II.5 and II.6

Proof. i) $\Rightarrow$ ii). By (26),

[TABLE]

It follows from A1) that

[TABLE]

By comparing (31) and (C.1), we obtain $\mathbb{E}[\hat{x}_{i}]=\bar{x}$ . Note that $\|\bar{x}\|^{2}=\big{\|}\mathbb{E}\hat{x}_{i}\big{\|}^{2}\leq\mathbb{E}\|\hat{x}_{i}\|^{2}$ . It follows from (34) that

[TABLE]

By (31), we have

[TABLE]

where $h=-BR^{-1}B^{T}s+f$ . By the arbitrariness of $\bar{x}_{0}$ with (C.2) we obtain that $A+G-BR^{-1}B^{T}\Pi-\frac{\rho}{2}I$ is Hurwitz. That is, $(A+G-\frac{\rho}{2}I,B)$ is stabilizable. By [2], (29) admits a unique solution such that $\Pi>0$ . Note that $\mathbb{E}[x^{(N)}]^{2}\leq\frac{1}{N}\sum_{i=1}^{N}\mathbb{E}[\hat{x}_{i}^{2}]$ . Then from (34) we have

[TABLE]

This leads to $\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\|g(t)\|^{2}dt<\infty$ , where $g{=}-BR^{-1}B^{T}((\Pi-P)\bar{x}+s)+G\hat{x}^{(N)}+f$ . By (B.1), we obtain

[TABLE]

By (34) and the arbitrariness of ${x}_{i0}$ we obtain that $\bar{A}-\frac{\rho}{2}I$ is Hurwitz, i.e., $(A-\frac{\rho}{2}I,B)$ is stabilizable. By [2], (28) admits a unique solution such that $P>0$ .

From (C.2) and (C.3),

[TABLE]

On the other hand, (A.2) gives

[TABLE]

By (C.4) and the arbitrariness of ${x}_{i0},i=1,\cdots,N$ , we obtain that $\bar{A}+G-\frac{\rho}{2}I$ is Hurwitz.

(ii) $\Rightarrow$ (iii). Define $V(t)=e^{-\rho t}\bar{y}^{T}(t)\Pi\bar{y}(t)$ , where $\bar{y}$ satisfies

[TABLE]

Denote $V$ by $V^{*}$ when $\bar{u}=\bar{u}^{*}=-{R^{-1}}B^{T}\Pi\bar{y}$ . By (29) we have

[TABLE]

Note that $V^{*}\geq 0$ . Then $\lim_{t\to\infty}V^{*}(t)$ exists, which implies

[TABLE]

Rewrite $\Pi(t)$ in (23) by $\Pi_{T}(t)$ . Then we have $\Pi_{T+t_{0}}(t_{0})=\Pi_{T}(0)$ . By (23),

[TABLE]

This with (C.5) implies

[TABLE]

By A3), one can obtain that there exists $T>0$ such that $\Pi_{T}(0)>0$ (See e.g. [43, 44]). Thus, we have $\lim_{t\to\infty}e^{-\rho t}\big{\|}\bar{y}({t})\big{\|}^{2}=0$ , which $(A+G-\frac{\rho}{2}I,B)$ is stabilizable. Similarly, we can show $(A-\frac{\rho}{2}I,B)$ is stabilizable.

(iii) $\Rightarrow$ (i). This part has been proved in Theorem II.4. $\hfill\Box$

Proof of Theorem II.6. (iii) $\Rightarrow$ (i). From [2], (28) and (29) admit unique solutions $P\geq 0,\Pi\geq 0$ such that $A-BR^{-1}B^{T}P-\frac{\rho}{2}I$ and $A-BR^{-1}B^{T}\Pi-\frac{\rho}{2}I$ are Hurwitz, respectively. Thus, there exists a unique $s(0)$ such that $s\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ . It is straightforward that $\bar{x}\in C_{\rho/2}([0,\infty),\mathbb{R}^{n})$ . By the argument in the proof of Theorem II.4, (i) follows.

(i) $\Rightarrow$ (ii). The proof of this part is similar to that of (i) $\Rightarrow$ (ii) in Theorem II.5.

(ii) $\Rightarrow$ (iii). Since $\Pi\geq 0$ , then there exists an orthogonal $U$ such that

[TABLE]

where $\Pi_{2}>0$ . From (28),

[TABLE]

where $\bar{\mathbb{A}}\stackrel{{\scriptstyle\Delta}}{{=}}A+G-\Pi BR^{-1}B^{T}\Pi,\bar{Q}=\hat{Q}+\Pi BR^{-1}B^{T}\Pi$ . Denote

[TABLE]

By pre- and post-multiplying by $\xi^{T}$ and $\xi$ where $\xi=[\xi_{1}^{T},0]^{T}$ , it follows that

[TABLE]

From the arbitrariness of $\xi_{1}$ , we obtain $\bar{Q}_{11}=0$ . Since $\bar{Q}$ is semi-positive definite, then $\bar{Q}_{12}=\bar{Q}_{21}=0$ , and $\bar{Q}_{22}\geq 0$ . By comparing each block matrix of both sides of (C.6), we obtain $\bar{\mathbb{A}}_{21}=0$ . It follows from (C.6) that

[TABLE]

Let $\zeta=[\zeta_{1}^{T},\zeta_{2}^{T}]^{T}=U^{T}\bar{y}^{*}$ , where $\bar{y}^{*}$ satisfies $\dot{\bar{y}}^{*}=\bar{\mathbb{A}}\bar{y}^{*}$ . Then we have

[TABLE]

By Lemma 4.1 of [38], the detectability of $(A+G,\hat{Q}^{1/2})$ implies the detectability of $(\bar{\mathbb{A}},\bar{Q}^{1/2})$ . Take $\zeta(0)=\xi=[\xi_{1}^{T},0]^{T}$ . Then $\bar{Q}^{1/2}\bar{y}=\bar{Q}^{1/2}U\zeta=0$ , which together with the detectability of $(\bar{\mathbb{A}},\bar{Q}^{1/2})$ implies $\zeta_{1}\to 0$ and $\bar{\mathbb{A}}_{11}$ is Hurwitz. Denote $S(t)=e^{-\rho t}\zeta_{2}^{T}\Pi_{2}\zeta_{2}$ . By (C.7),

[TABLE]

which implies $\lim_{t\to\infty}S(t)$ exists. By a similar argument with the proof of Theorem II.5, we obtain $\lim_{t_{0}\to\infty}e^{-\rho t_{0}}\big{\|}\zeta_{2}(t_{0})\big{\|}^{2}_{\Pi_{2,T}(0)}=0$ and $\Pi_{2,T}(0)>0$ , which gives $\zeta_{2}\to 0$ and $\bar{\mathbb{A}}_{22}$ is Hurwitz. This with the fact that $\bar{\mathbb{A}}_{11}$ is Hurwitz gives that $\zeta$ is stable, which leads to (iii). $\hfill\Box$

Appendix D Proof of Theorems III.2 and III.3

Proof of Theorem III.2. From (52) and (53), we have

[TABLE]

where $\bar{A}=A-BR^{-1}B^{T}P$ . This implies that

[TABLE]

By Schwarz’s inequality,

[TABLE]

To prove (55), it suffices to only consider $u_{i}\in L^{2}_{{\cal F}_{t}}(0,T;\mathbb{R}^{r})$ such that $J_{i}^{\rm F}(u_{i},\hat{u}_{-i})\leq J_{i}^{\rm F}(\hat{u}_{i},\hat{u}_{-i})<\infty$ . By (3),

[TABLE]

After the set of strategies $(u_{i},\hat{u}_{-i})$ is applied, the corresponding dynamics of $N$ agents can be written as

[TABLE]

This with (47) implies

[TABLE]

By (D.1), (D.5) and elementary SDE estimates, one can obtain

[TABLE]

We have

[TABLE]

which together with (D.6) gives that

[TABLE]

Note that

[TABLE]

and $\bar{J}_{i}^{\rm F}(u_{i})<\infty$ . By Schwarz’s inequality, (D.6) and (D.7), we obtain

[TABLE]

From this and (D.2), the theorem follows. $\hfill\Box$

Proof of Theorem III.3. Note that $\{\hat{x}_{i}(t),i=1,\cdots,N\}$ are mutually independent processes with the expectation $\bar{x}(t)$ . By Lemma III.2,

[TABLE]

We only need to show $\mathbb{E}\int_{0}^{\infty}e^{-\rho t}\|{x}_{i}\|^{2}_{Q}dt\leq C$ for all $u_{i}$ satisfying

[TABLE]

From (D.8), we obtain

[TABLE]

which with Lemma III.2 implies

[TABLE]

where $C_{1}$ is independent of $N$ . The rest of the proof follows by that of Theorem III.2. $\hfill\Box$

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. Abou-Kandil, G. Freiling, V. Ionescu, and G. Jank, Matrix Riccati Equations in Control and Systems Theory . Birkhiiuser Verlag, 2003.
2[2] B. D. O. Anderson and J. B. Moore, Optimal Control: Linear Quadratic Methods . Englewood Cliffs, NJ: Prentice Hall, 1990.
3[3] J. Arabneydi and A. Mahajan, “Team-optimal solution of finite number of mean-field coupled LQG subsystems,” in Proc. 54th IEEE CDC , Osaka, Japan, 2015, pp. 5308-5313.
4[4] T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory . Academic Press, London, 1982.
5[5] D. Bauso, H. Tembine, and T. Basar, “Opinion dynamics in social networks through mean-field games,” SIAM J. Control Optim. , vol. 54, no. 6, pp. 3225-3257, 2016.
6[6] A. Bensoussan, K.C. Sung, S.C. Yam, and S. P. Yung, “Linear-quadratic mean field games,” J. Optimization Theory & Applications , vol. 169, no. 2, pp. 496-529, 2016.
7[7] A. Bensoussan, J. Frehse, and P. Yam, Mean Field Games and Mean Field Type Control Theory . Springer, New York, 2013.
8[8] P. E. Caines, M. Huang, and R. P. Malhame, Mean field games, in Handbook of Dynamic Game Theory , T. Basar and G. Zaccour Eds., Springer, Berlin, 2017.