Linear-Quadratic Mixed Stackelberg-Nash Stochastic Differential Game with Major-Minor Agents
Kehan Si, James Huang, Zhen Wu

TL;DR
This paper develops a framework for analyzing a large-population stochastic differential game involving major and minor agents, deriving equilibrium strategies through FBSDEs and Riccati equations.
Contribution
It introduces a novel combined Stackelberg-Nash equilibrium approach for mixed agent populations using mean-field game theory and provides explicit feedback strategies.
Findings
Derivation of equilibrium strategies via FBSDEs.
Explicit feedback form of strategies using Riccati equations.
Application to large-scale multi-agent systems.
Abstract
We consider a controlled linear-quadratic (LQ) large-population system with mixture of three types agents: major leader, minor leaders and minor followers. The Stackelberg-Nash-Cournot (SNC) approximate equilibrium is studied by a major-minor mean-field game (MFG) coupled with a leader-follower Stackelberg game. By variational method, the SNC approximate equilibrium strategy can be represented by some forward-backward-stochastic-differential-equations (FBSDEs) in the open-loop sense. And we pay great effort to give the feedback form of the open-loop strategy by some Riccati equations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications · Advanced Thermodynamics and Statistical Mechanics · Complex Systems and Time Series Analysis
Linear-Quadratic Mixed Stackelberg-Nash Stochastic Differential Game with Major-Minor Agents††thanks: J. Huang acknowledges the financial support by RGC Grant PolyU 153005/14P, 153275/16P.
Kehan Si James Huang Zhen Wu School of Mathematics, Shandong University, Jinan, Shandong Province, 250100, China ([email protected]).Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China ([email protected]).School of Mathematics, Shandong University, Jinan, Shandong Province, 250100, China ([email protected]).
Abstract
We consider a controlled linear-quadratic (LQ) large-population system with mixture of three types agents: major leader, minor leaders and minor followers. The Stackelberg-Nash-Cournot (SNC) approximate equilibrium is studied by a major-minor mean-field game (MFG) coupled with a leader-follower Stackelberg game. By variational method, the SNC approximate equilibrium strategy can be represented by some forward-backward-stochastic-differential-equations (FBSDEs) in the open-loop sense. And we pay great effort to give the feedback form of the open-loop strategy by some Riccati equations.
**Key words: ** Stackelberg-Nash-Cournot approximate equilibrium, Mean-field game, FBSDE, Leader-follower game, Major-minor agent, Open-loop strategy, Closed-loop strategy.
1 Introduction
On a given finite time horizon let be a complete probability space on which a -dimensional standard Brownian motion , is defined. In this paper, we consider a large-population system involving individual agents (where and are very large) which are mixed with three types: the major-leader, denoted by , minor-leaders and the followers . The dynamics of are given respectively by the following controlled linear stochastic differential equations:
[TABLE]
[TABLE]
and
[TABLE]
where and are called state-average or mean field term. , , , , , , , , , , , , , , , , , , , are deterministic constant matrices with proper dimensions. In the above, , , are called the state process taking values in with initial values , , which are random variables. , , are called admissible controls taken by players in the game and taking values in , , , respectively. Under some mild conditions on the coefficients, for any initial values , , , (LABEL:C6e1), (LABEL:C6e2) and (LABEL:C6e3) admits a unique strong solution. The performance can be measured by the following cost functionals: for
[TABLE]
for ;
[TABLE]
and for ,
[TABLE]
where for given vector , for is any matrix or matrix-valued function of suitable dimensions, and where are deterministic symmetric matrix of suitable dimensions. As we can see, all agents are coupled not only in their state process but also in their cost functionals with convex combinations of state-average.
Roughly speaking, agent will give his/her best respond according to the strategies from major leader and minor leaders to minimize his/her own cost functional . And agent will also give his/her best respond according to the strategies minor leaders and the best respond of minor followers to minimize his/her own cost functional . Knowing the best respond of major leader and minor followers, agent wants to minimize his/her own cost functional by choosing an optimal control . However, due to the state-average coupling, our problem is essentially a high-dimensional Stackelberg-Nash differential game. Moreover, is the the dominate or major leader because it effects the cost functionals of all minor leaders.
We call the above problem formulated as Mixed Stakelberg-Nash Major-minor (SN-MM) differential game. The following comments on our formulation further verify such terminology.
(Single leader-follower game) In case thus there has no minor leaders and only single followers, with one major leader, then our problem reduces to the classical single-leader and single-follower game. The Stackelberg game has been proposed in 1934 by H. von Stackelberg [30], when he defined a concept of a hierarchical solution for markets where some firms have power of dominating over others. This solution concept is now known as the Stackelberg equilibrium. Early study for stochastic Stackelberg differential games (SSDG) can be seen in Basar (1979) [2]. A pioneer work was done by Yong (2002) [32], where a LQ leader-follower stochastic differential game (SDG) was introduced and studied. The coefficients of the system and the cost functionals are random, the controls enter the diffusion term of the state equation, and the weight matrices for the controls in the cost functionals are not necessarily positive definite. To give a state feedback representation of the OL Stackelberg equilibrium, the related Riccati equations are derived and sufficient conditions for the existence of their solution with deterministic coefficients are discussed. Here after, Bensoussan, Chen, and Sethi (2015) [4] obtained the global maximum principles for both open-loop (OL) and closed-loop (CL) SSDG whereas the diffusion term does not contain the controls. The solvability of related Riccati equations is discussed, in order to obtain the state feedback Stackelberg equilibrium.
(Multiple leaders-followers game) In case are of medium or small size, then our problem is reduced to the Stackelberg game with multiple leaders and multiple followers. It is a natural extension of the single leader-follower game and the relevant works include [5, 6, 28], etc.
(Mean-field-game with symmetric agents) In case and no involved, then our problem becomes the standard dynamic game with a very large number of minor (symmetric) agents in which each single agent interacts with the mass-effect of other agents only through coupling in states/dynamics. For large population stochastic dynamics, one effective method is to search its decentralized strategies by the mean-field-game (MFG) theory. We recall that there are much work to study mean field game (MFG). Since the recent independent works by Huang, Caines, and Malhamé [19, 20] and Lasry and Lions [22, 23, 24], mean field game (MFG) theory and its applications have enjoyed rapid growth. MFG provides a simpler alternative framework for tackling the interactive game for a large number of homogeneous agents. By allowing agents to interact through a common medium, known as the mean field term, formulation of the dynamic game under the MFG framework consists of only a few equations. Further developments on the theory of MFG can be found in the works of Andersson and Djehiche [1], Bardi [3], Bensoussan, Frehse, and Yam [8], Buckdahn et al. [7], Cardaliaguet [10], Carmona and Delarue [11], Garnier, Papanicolaou, and Yang [15], Guéant, Lasry, and Lions [16], Meyer-Brandis, Øksendal, and Zhou [25], and the references therein.
(Major-minor game) In case then there has no followers and only major and minor leaders involved, and our problem becomes the major-minor (MM) mean-field game (MFG). The MM-MFG is introduced in [21], and has been well investigated by [27] common major-minor mean field LQG game (refer to [27]). Our model generalizes [5, 6, 27, 28] because it includes not only leader-follower structure but also major-minor structure.
(Convex combination) Refer to Nourian, Caines, Malhamé and Huang (2012) [28], here we consider a kind of general case of cost functional with likelihood ratio (i.e. convex combination). On other words, for an example, the cost functional of the major leaders is based on a trade-off between keeping cohesion of the flock of minor leaders and keeping cohesion of the flock of the followers (see (4)). By the way, we may be interested in special case like which means the cost functional of followers are directly influenced by the major-leader or which means the cost functional of followers are not directly influenced by the major-leader. We will discuss difference between the special case and the general case in following sections.
Remark 1.1**.**
Application of our problem formulation may be found in power markets involving large size of consumers and large utilities together with the following producer; inventory management without stocking capacities. (refer to [13]) The state processes are characterized by three kinds of group. One we called major leader agent can be regarded as the government or supervisory in the economic issues. And the ones we called minor leader agents can be regarded as the corresponding companies or firms. The rest ones we called minor follower agents can be regarded as the related suppliers of raw material or manufacturers of primary commodity, etc. We can see that the state processes of three types of group have no influence on each other but the cost functionals do have direct influence on each other.
Our present work considers the combination problems of leader-follower and major-minor systems, where the large scale population is also under consideration. In the entire system, the major and a part of minor agents are together regarded as the leaders, which are called major-leader and minor-leaders respectively and the rest are called minor followers (followers). Obviously, the more complex structure will bring some technical problem. Besides, there are lots of interesting questions remain to be solved. For an example, we can consider the state processes include the mean-field term which may coincide the real world much better or focus on the more realistic cost functional, etc.
Let us now explain the argument structure of our problem. In principle, the above problem can be studied as a MM-MFG coupled with a leader-follower game. Accordingly, original problem can be analyzed through the following structures:
Step 1: Fix the mass effect limit of minor leaders and major leader With frozen introduce and solve the auxiliary problem to get the best decentralized response function of minor followers, denoted by
Step 2: Given the response functional and frozen solve the decentralized SOC problem of and denote the optimal solution pair as .
Step 3: Given solve the optimal control for the minor leaders. Influenced by the optimal control of major leader (supposing it exists, which depends on the choices of minor leaders and the initial state , , , in general), agents , , (the minor leaders) would like to choose some to minimize .
Step 4 CC condition to specify and all decentralized strategies can be designed. Approximate Stackelberg-Counot-Nash equilibrium can be verified.
The main contribution of this paper can be summarized as follows:
- •
The decentralized strategy profile is investigated in both (semi-)closed-loop and open-loop sense.
- •
Existence and uniqueness of the CC condition system is investigated in the global solvability case.
- •
the CC condition system is represented via a full-coupled mean-field type FBSDE in open-loop case, and FBSDE and non-standard Riccati equation in closed-loop sense.
- •
The approximate Nash equilibrium Stakleberg game is verified under more general condition (more than standard assumption with positive-definitiveness on coefficient matrix).
The rest of this paper is organized as follows. In section 2, we give the formal problem formulation and some preliminaries. In section 3, we discuss the open-loop strategy of Stackelberg mixed major-minor games. In section 4 We get the consistency condition system equations based on the open-loop strategy, which is a fully coupled FBSDE. Besides, we get the criteria to judge the well-posedness of such a FBSDE. At last, we verify the OL strategy we got is -Nash equilibrium OL strategy of the original problem.
2 Preliminary and formulation
The following notations will be used throughout this paper. Let denotes the dimensional Euclidean space, be the set of all matrices, and let be the set of all symmetric matrices. We denote the transpose by subscript ⊤, the inner product by and the norm by . For and Euclidean space , we introduce the following function spaces:
[TABLE]
and the spaces of process or random variables on given filtrated probability space:
[TABLE]
We set the following information structures, which are important to introduce our admissible strategies: is the natural filtration generated by all BM components augmented by all the -null sets in , , it can be viewed as the full information of all states and noises; is the natural filtration generated by augmented by all the -null sets in . It is the space on which the limiting state-average should be adapted; is the natural filtration generated by augmented by all the -null sets in , ; is the natural filtration generated by augmented by all the -null sets in , .
Given information structures, we can set the following Hilbert spaces for centralized and decentralized strategies for individual agents in open-loop sense:
[TABLE]
and decentralized open-loop strategies:
[TABLE]
Let denote the strategy set of all agents; the set of control strategies of all major-leader agents; the set of strategy profile of all follower agents; the control strategy set of major-leader agents except ; the control strategy set of follower agents except the follower agent .
Sometimes, when defining the Stakelberg-Nash-Counor strategy, it is helpful to set the following product space for strategy set. We denote
[TABLE]
[TABLE]
Then any is called an admissible centralized strategy, and any is called an admissible decentralized strategy.
Let us introduce the following hypothesis on coefficients of state dynamics and cost functionals:
(H1)
The coefficients of the state equations and cost functionals satisfy the following:
[TABLE]
(H2)
The initial states are independent; , for each , ; and there exists independent of and such that and .
We point out that no positive-definiteness/non-negativeness conditions on the weighting matrix/matrix-valued functions imposed in (H1). Moreover, the coefficients of the convex combination , , , , . Under (H1), for any (resp., ), (resp., ), (resp., ), (LABEL:C6e1), (LABEL:C6e2), (LABEL:C6e3) admits a unique (strong) solution. And the cost functionals (4), (5), (6) are also well-defined.
For simplicity, in (H2) it is assumed that all minor leaders and followers have zero initial mean. It is possible to generalize our analysis to deal with different initial means as long as and has a limiting empirical distribution. Now, we can introduce the Stakelberg-Counot-Nash equilibrium as follows.
Definition 2.1**.**
A -tuple , is called an open-loop Stakelberg-Counot-Nash equilibrium for the initial states if:
[TABLE]
where , and .
If there is no confusion, we use the same notation to denote the optimal respond and the optimal strategy of major leader and minor followers. The above definition of OL strategy is defined in centralized sense. In particular, game theory has been formulated to capture such individual interest seeking behavior of the agents in many social, economic and man-made systems.
For fixed , if each agents can access the full information (states) of other agents, we may view the problem as a standard dynamic LQG leader-follower games and use the full information DPP to derive the Stakelberg-Counot-Nash equilibrium. We now introduce the following definition. However, scale dynamic model, this approach results in an analytic complexity which is in general prohibitively high, and correspondingly leads to few substantive dynamic optimization results. The optimization of large-scale linear control systems wherein i) many agents are coupled with each other via their individual dynamics, and ii) the costs are in an individual to the mass form was presented in where the theory of mean field (MF) control (previously termed Nash Certainty Equivalence) was introduced. It is to be noted that the dynamic large-scale cost coupled optimization structure of is motivated by a variety of scenarios, for instance, those analyzed in MFG analysis.
2.1 Mixed Stakelberg-Counot-Nash equilibrium analysis
To deal with mixed leader-follower MM dynamic game using MFG theory, one should start with followers. And to deal with a major-minor MFG, one should start with major players. Although the relationships get complicated under our situation, we can still deal with it step by step. That is, firstly, we can solve the optimization problems of followers. The left is a classic major-minor problem and solved in the way of [21]. The interesting things occur when the major-leader imposes some direct impacts to the followers (i.e., ), which will lead to that the state process of major leader will be relied on a kind of forward-backward stochastic differential equation (FBSDE). Generally speaking, it is hard to get the centralized strategy of such mixed Stackelberg MM-MFG. So, let us briefly look at the procedure of finding a decentralized open-loop -Nash equilibrium strategy of the original problem. And the procedure of finding a decentralized closed-loop -Nash equilibrium strategy is very similar which will be formulated in next subsection.
Step 1: MFG analysis of followers: Let us introduce the auxiliary limiting LQG differential game problems. Firstly, by the Stackelberg game, for given strategy of major leader and minor leaders, followers have to minimize the following cost functionals:
[TABLE]
where . Furthermore, as we suppose can be approximated by -measurable function . Then the state process of the follower becomes
[TABLE]
with the following auxiliary cost functionals
[TABLE]
for . To distinguish from the original problem, we use the new state variables and we will denote and the new state variables later. But we still use the same set of variables in this auxiliary limiting problem, and such a reuse of notation should cause no confusion. Then, introduce the following auxiliary Nash game for followers as follows.
Problem (OL1). For given , -measurable functions , and the control of major leader , find an open-loop strategy , . On other words, Find the Nash equilibrium response functional of the following Nash differential games among followers:
[TABLE]
The analysis of Problem (OL1) can be further decomposed into substeps using MFG theory.
Step 1.1 (SOC-F): Fixed , and consider the Nash equilibrium response functional of the above Problem (OL1) for representative minor-follower agent denoted by . For given , -measurable functions , and the control of major leader , find an open-loop strategy , . On other words, Find the Nash equilibrium response functional of the following Nash differential games among followers:
[TABLE]
Step 1.2 (CC-F): applying state-aggregation method, it is possible to determine the state-average limit by the following condition:
[TABLE]
By such step, the Nash equilibrium response functional of follower and can be specified, given any admissible profile announced by leaders.
Given the approximate Nash response of all followers, we can turn to the Nash analysis of all leaders. To this, it is necessary to have some MM-MFG analysis when there are both major-minor agents.
Step 2: MFG analysis of major-leader: Anticipating the Nash equilibrium response functional of follower , the leaders should solve some Nash equilibrium with size . Similarly, we can assume that as we suppose can be approximated by -measurable function . Then the state process of the major leader and minor leaders becomes
[TABLE]
and
[TABLE]
with the following auxiliary cost functionals
[TABLE]
for , and
[TABLE]
for . This can be formulated into MM-MFG. We can analyze the optimal control of major leader first. We can set the following auxiliary problem for the major-leader.
Problem (OL2). For given and -measurable functions , find an open-loop strategy such that
[TABLE]
Step 3: MFG analysis of minor-leader: Anticipating the Nash equilibrium response functional of follower and the optimal control of the major leader. Under the state process (LABEL:G28) with the cost functional (15), we consider the following problem for minor-leaders.
Problem (OL3). For given , and -measurable functions , find an open-loop strategy , , such that
[TABLE]
Step 4: Consistency condition of (Open-loop) Stakelberg-Cornot-Nash equlibrium: CC condition to determine the frozen by
[TABLE]
And turn to get its global solvability.
In order to show the steps more clearly, here we illustrate the steps by the figure as follows.
(X_{0},u_{0})$$(x_{j},\overline{v}_{j}[u_{0},\overline{m}_{X},\overline{m}_{x}])$$\mathbb{E}[\overline{X}_{i}(\overline{u}_{i}[\overline{X}_{0},\overline{m}_{X}])]=\overline{m}_{X}$$\mathbb{E}[x_{j}(\overline{v}_{j}[u_{0},\overline{m}_{X},\overline{m}_{x}])]=\overline{m}_{x}$$(\overline{X}_{i},\overline{u}_{i}[\overline{X}_{0},\overline{m}_{X}])$$(\overline{X}_{0},\overline{u}_{0}[\overline{m}_{X},\overline{m}_{x}])Step 1.1Step 1.2Step 2Step 3Step 4
3 Open-loop strategies
From now on, we will suppress time variable in the equation unless it is necessary. In this section, we study the Mixed S-MM-game strategy in OL sense.
3.1 Open-loop strategies for the followers
In this subsection, we solve out Problem (OL1) firstly. The main result of this section can be stated as follows.
Theorem 3.1**.**
Under assumptions (H1), (H2), and let , , , , be given. Then is an open-loop decentralized optimal control of Problem (OL1) for initial value if and only if the following two conditions hold:
- (i)
For , the adapted solution to the FBSDE on
[TABLE]
satisfies the following stationarity condition:
[TABLE] 2. (ii)
For , the following convexity condition holds:
[TABLE]
where is the solution to the FSDE
[TABLE]
Or, equivalently, the map , is convex.
Proof. For given , , , , , and , let , , be adapted solution to FBSDE (16). For any and , let be the solution to the following perturbed state equation on :
[TABLE]
Then denoting the solution to the FSDE (LABEL:b2), we have and
[TABLE]
On the other hand, applying Itô’s formula to \Big{\langle}\overline{y}_{j},x_{j}\Big{\rangle}, and taking expectation, we obtain
[TABLE]
Hence,
[TABLE]
It follows that
[TABLE]
if and only if (17) and (18) hold.
Furthermore, if we assume that is invertible, then we have
[TABLE]
so the related Hamiltonian system can be represented by
[TABLE]
Based on above analysis, it follows that
[TABLE]
Here, the first equality of (21) is due to the consistency condition: the frozen term should equal to the average limit of all realized states ; the second equality is due to the law of large numbers. Thus, by replacing by , we get the following system
[TABLE]
As all agents are statistically identical, thus we can suppress subscript “” and the following consistency condition system arises for generic agent:
[TABLE]
where stands for a generic Brownian motion on and it is independent of . is a representative element of , and , are to be determined.
3.2 Open-loop strategies for the major leader
Once Problem (OL1) is solved, we turn to solve Problem (OL2) about the major leader (agent ). Note that when the followers take their optimal respond given by (20), the major leader ends up with the following state equation system:
[TABLE]
And its cost functional is given by (14). Note that equation (23) is a two-point boundary value problem for SDEs, which is what we call a forward-backward stochastic differential equation (FBSDE; see [26, 33, 32, 34]) and the cost functional is still linear quadratic form. Hence, we are going to solve the LQ problem for a FBSDE. Noting that this FBSDE is coupled, therefore, it is not so easy to deal with it. Let us keep in mind that the “state” for (23) is the triple . The main result of this section can be stated as follows.
Theorem 3.2**.**
Under assumptions (H1), (H2), and let , be given. Then is an open-loop decentralized optimal control of Problem (OL2) for initial value if and only if the following two conditions hold:
- (i)
The adapted solution to the FBSDE on
[TABLE]
satisfies the following stationarity condition:
[TABLE] 2. (ii)
The following convexity condition holds:
[TABLE]
where is the solution to the FBSDE
[TABLE]
Or, equivalently, the map is convex.
Proof. For given , , and , let , , , , , be adapted solution to FBSDE (24). For any and , let , , be the solution to the following perturbed state equation on :
[TABLE]
Then denoting the solution to the FBSDE (27), we have , , , and
[TABLE]
On the other hand, applying Itô’s formula to , and taking expectation, we obtain
[TABLE]
Hence,
[TABLE]
It follows that
[TABLE]
if and only if (25) and (LABEL:C6e27) hold.
Similarly, if we assume is invertible, then we can represent the optimal control by
[TABLE]
Then the following coupled system follows
[TABLE]
where is to be determined.
3.3 Open-loop strategies for the minor leaders
Once Problem (OL2) is solved, we turn to solve Problem (OL3) about the minor leaders (agents , ). Note that when the followers takes their optimal responds given by (20), and the major leader takes his optimal control given by (28), the minor leaders ends up with the following state equation system:
[TABLE]
And its cost functional is given by (15) with being from (29). So it is similar to solve Problem (OL1), and the main result in this section can be stated as follows.
Theorem 3.3**.**
Under assumptions (H1), (H2), and let , , be given. Then is a decentralized optimal control of Problem (OL3) for initial value if and only if the following two conditions hold:
- (i)
For , the adapted solution to the FBSDE on
[TABLE]
satisfies the following stationarity condition:
[TABLE] 2. (ii)
For , the following convexity condition holds:
[TABLE]
where is the solution to the FSDE
[TABLE]
Or, equivalently, the map is convex (for ).
Proof. For given , , , and , let , , be adapted solution to FBSDE (30). For any and , let be the solution to the following perturbed state equation on :
[TABLE]
Then denoting the solution to the FSDE (LABEL:b6), we have and
[TABLE]
On the other hand, applying Itô’s formula to \Big{\langle}\overline{Y}_{i},X_{i}\Big{\rangle}, and taking expectation, we obtain
[TABLE]
Hence,
[TABLE]
It follows that
[TABLE]
if and only if (31) and (32) hold.
Furthermore, if we assume that is invertible, then we have
[TABLE]
so the related Hamiltonian system can be represented by
[TABLE]
Based on the above analysis, it follows that
[TABLE]
Here, the first equality of (35) is due to the consistency condition: the frozen term should equal to the average limit of all realized states ; the second equality is due to the law of large numbers. Thus, by replacing by , we get the following system
[TABLE]
As all agents are statistically identical, thus we can suppress subscript “” and the following consistency condition system arises for generic agent:
[TABLE]
where stands for a generic Brownian motion on , and it is independent of . is a representative element of .
To the end of the section, combined with (29) and (LABEL:G4), replacing by , we can get the consistency condition system for open-loop strategy as follows.
[TABLE]
4 The Consistency Condition System
Under assumptions (H1), (H2), when , and are always invertible, we get the consistency condition (CC) for OL strategy in section 3. In this section, we turn to verify the well-posedness of the CC equation.
For the simplicity of notation, denote , , , , and then the consistency condition system (LABEL:G5) can be rewritten as
[TABLE]
where
[TABLE]
4.1 Decoupling for open-loop strategy
Then, we turn to decouple the FBSDE (38) by Riccati equation. Note that
[TABLE]
Hence,
[TABLE]
Now, we assume that
[TABLE]
for some deterministic and differentiable functions and , taking values in , such that
[TABLE]
Then
[TABLE]
and
[TABLE]
Therefore,
[TABLE]
Comparing the diffusion terms, we should have
[TABLE]
Then
[TABLE]
and
[TABLE]
Comparing the drift terms, we should have
[TABLE]
Therefore, we should let and be the solutions to the following Riccati equations, respectively:
[TABLE]
and
[TABLE]
4.2 Decoupling for the feedback strategy
Except the pure open-loop method, we can also introducing the following Riccati equations to decouple the Hamiltonian systems first.
The Hamiltonian system of minor follower is
[TABLE]
with the stationary condition
[TABLE]
Assume that , and we can get the Riccati equations
[TABLE]
and
[TABLE]
where
[TABLE]
Note that
[TABLE]
so the feedback is
[TABLE]
The major leader ends up with the following Hamiltonian system
[TABLE]
with the stationary condition
[TABLE]
where
[TABLE]
Assume that , where
[TABLE]
and for simplicity, we rewrite the Hamiltonian system by
[TABLE]
where
[TABLE]
then we can get the following Riccati equations
[TABLE]
and
[TABLE]
So the feedback is
[TABLE]
At last, the Hamiltonian system of minor leader is
[TABLE]
with the stationary condition
[TABLE]
Assume that , and we can get the Riccati equations
[TABLE]
and
[TABLE]
where
[TABLE]
Note that
[TABLE]
so the feedback is
[TABLE]
5 -Nash Equilibrium Analysis
In above sections, we obtained the decentralized open-loop strategy of the mixed S-MM-MFG through the consistency condition system. Now we turn to verify that it is the SNC approximate equilibrium (i.e. -Stackelberg-Nash-Cournot equilibrium). In order to ensure the solvability of the open-loop strategy, we assume the Riccati equation (LABEL:C6e18) and (LABEL:C6e19) admits a unique solution. At the beginning, we first present the definition of -SNC equilibrium.
Definition 5.1**.**
A set of controls , for agents is called to satisfy an -SNC equilibrium with respect to the costs , if there exists (), such that for any fixed , , we have
[TABLE]
when any alternative control is applied by .
At first, we present the main result in this section and its proof will be given later.
Theorem 5.1**.**
Under assumptions (H1)-(H2), and if the conditions in the Theorem 3.1, Theorem 3.2, Theorem 3.3 hold, then is an -Nash equilibrium of mixed S-MM-MFG for major leader agent , each minor leader agent , , and each follower agent , . And is given by
[TABLE]
for , , solved by (LABEL:G5).
For major leader , minor leaders and followers , the decentralized states , and are given respectively by
[TABLE]
where the processes , , solved by (LABEL:G5). Let us first present following several lemmas and for the simplicity of notation, we denote the inner product .
Lemma 5.1**.**
Under assumptions (H1)-(H2), and if the conditions in the Theorem 3.1, Theorem 3.2, Theorem 3.3 hold, then there exists a constant independent of and , such that
[TABLE]
[TABLE]
Proof. From Theorem 3.1, Theorem 3.2, Theorem 3.3, the FBSDEs (16), (24) and (30) have a unique solution , and , , . Thus, SDEs system (47) has also a unique solution
[TABLE]
From (47), by using Burkholder-Davis-Gundy (BDG) inequality, there exists a constant , independent of and , such that for any ,
[TABLE]
and by Gronwall’s inequality, we obtain
[TABLE]
Similarly, we have
[TABLE]
and
[TABLE]
Thus
[TABLE]
and
[TABLE]
By Gronwall’s inequality, it follows that \mathbb{E}\Big{[}\sup_{0\leq s\leq t}\sum_{i=1}^{N_{l}}\big{|}\overline{X}_{i}(s)\big{|}^{2}\Big{]}=O(N_{l}), and \mathbb{E}\Big{[}\sup_{0\leq s\leq t}\sum_{j=1}^{N_{f}}\big{|}\overline{x}_{j}(s)\big{|}^{2}\Big{]}=O(N_{f}). Then, substituting this estimate to (49) and (50) and Gronwall’s inequality yields \mathbb{E}\Big{[}\sup_{0\leq s\leq t}\big{|}\overline{X}_{i}(s)\big{|}^{2}\Big{]}\leq M, and \mathbb{E}\Big{[}\sup_{0\leq s\leq t}\big{|}\overline{x}_{j}(s)\big{|}^{2}\Big{]}\leq M. By applying this estimate to (48), we get \mathbb{E}\Big{[}\sup_{0\leq s\leq t}\big{|}\overline{X}_{0}(s)\big{|}^{2}\Big{]}\leq M.
Now, we recall that
[TABLE]
then we have
Lemma 5.2**.**
Under assumptions (H1)-(H2), and if the conditions in the Theorem 3.1, Theorem 3.2, Theorem 3.3 hold, then there exists a constant independent of and , such that
[TABLE]
[TABLE]
Proof. For the first one, we have
[TABLE]
From (LABEL:N2), by using Burkholder-Davis-Gundy (BDG) inequality and Lemma 5.1, there exists a constant , independent of and , such that for any ,
[TABLE]
and by Gronwall’s inequality, we obtain
[TABLE]
By the same way, we can prove the second formula.
Lemma 5.3**.**
Under assumptions (H1)-(H2), and if the conditions in the Theorem 3.1, Theorem 3.2, Theorem 3.3 hold, then there exists a constant independent of and , we have
[TABLE]
where .
Proof. Let us first consider the major leader agent. Recall (4) and (14), we have
[TABLE]
By Hölder inequality and Lemma 5.1, there exists a constant independent of and such that
[TABLE]
Noting (52), (53) and Lemma 5.2, there exists a constant independent of and such that
[TABLE]
The rest two claims can be proved in the same way.
Remark 5.1**.**
We denote the common constant of the different boundaries. In the above lemmas, the constant may vary each line by line but it is always independent of the number of minor-leader agents and the number of follower agents .
5.1 Major leader agent’s perturbation
In this subsection, we will prove that the control strategies set given by Theorem 5.1 is an -Nash equilibrium of mixed S-MM-MFG for major leader agent, i.e. there exists an , such that
[TABLE]
Let us consider that the major leader agent uses an alternative strategy , each minor leader agent uses the control and each follower agent uses the control . To prove is an -Nash equilibrium for the major leader agent, we need to show that for possible alternative control , . Then we only need to consider the perturbation such that . By the representation of the cost functional in [35, 31], we can give the representation of the cost functional as follows.
Proposition 5.1**.**
Let (H1)-(H2) hold. There exists a bounded self-adjoint linear operator , a bounded linear operator , a bounded real-valued function such that
[TABLE]
Proof. Refer to Proposition 3.1 in [31].
So if we assume that , from Lemma 5.3, then there exists a bounded constant , such that
[TABLE]
which implies that \mathbb{E}\Big{[}\int_{0}^{T}\big{|}u_{0}(t)\big{|}^{2}\mathrm{d}t\Big{]}\leq M, where is a constant independent of . In fact, by bounded inverse theorem, is bounded, so there exists a constant , such that
[TABLE]
Then we have \mathbb{E}\Big{[}\int_{0}^{T}\big{|}u_{0}(t)\big{|}^{2}\mathrm{d}t\Big{]}\leq M. Similar to Lemma 5.1, we can show that
[TABLE]
Lemma 5.4**.**
Under assumptions (H1)-(H2), and if the conditions in the Theorem 3.1, Theorem 3.2, Theorem 3.3 hold, for the major leader agent’s perturbation control , we have
[TABLE]
Proof. Recall (4) and (14), we have
[TABLE]
By Hölder inequality and (55), there exists a constant independent of and such that
[TABLE]
At last, same as the Lemma 5.3, noting (56), (57), and Lemma 5.2, there exists a constant independent of and such that
[TABLE]
Taking the advantage of Lemma 5.3 and Lemma 5.4, we can give the first part of the proof to the Theorem 5.1, i.e. the control strategies set given by Theorem 5.1 is an -Nash equilibrium of the mixed S-MM-MFG for major leader agent.
Part A of the proof to Theorem 5.1
Combining Lemma 5.3 and Lemma 5.4, we have
[TABLE]
where the second inequality comes from the fact that . Consequently, the Theorem 5.1 holds for the major leader agent with \varepsilon=O\Big{(}\frac{1}{\sqrt{N}}\Big{)}.
5.2 Minor leader agent’s perturbation
Now, let us consider the following case: a given minor leader agent uses an alternative strategy , the major leader agent uses , each follower agent uses while other minor leader agents use the control . In fact, by the representation of the cost functional (which is similar to the argument of major leader agent), to prove is an -Nash equilibrium for the minor leader agent, we only need to consider the perturbation satisfying
[TABLE]
where is a constant independent of . Then similar to Lemma 5.1, we can show that
[TABLE]
Lemma 5.5**.**
Under assumptions (H1)-(H2), and if the conditions in the Theorem 3.1, Theorem 3.2, Theorem 3.3 hold, then there exists a constant independent of and , such that
[TABLE]
where X^{(i,N_{l})}(t)=\frac{1}{N_{l}}\big{(}X_{i}(t)+\sum_{k\neq i}\overline{X}_{k}(t)\big{)}.
Proof. In fact, we have
[TABLE]
by (59) , it yields
[TABLE]
Combined with Lemma 5.2, we can directly get
[TABLE]
Lemma 5.6**.**
Under assumptions (H1)-(H2), and if the conditions in the Theorem 3.1, Theorem 3.2, Theorem 3.3 hold, for the minor leader agent’s perturbation control , we have
[TABLE]
Proof. Recall (5) and (15), we have
[TABLE]
By same technique, applying Hölder inequality, Lemma 5.5, and (59), there exists a constant independent of and such that
[TABLE]
Taking the advantage of Lemma 5.3 and Lemma 5.6, we can give the second part of the proof to the Theorem 5.1, i.e. the control strategies set given by Theorem 5.1 is an -Nash equilibrium of the mixed S-MM-MFG for minor leader agent.
Part B of the proof to Theorem 5.1
Combining Lemma 5.3 and Lemma 5.6, we have
[TABLE]
where the second inequality comes from the fact that . Consequently, the Theorem 5.1 holds for the minor leader agent with \varepsilon=O\Big{(}\frac{1}{\sqrt{N}}\Big{)}.
5.3 Follower agent’s perturbation
At last, we consider the following case: a given follower agent uses an alternative strategy , the major leader agent uses , each minor leader agent uses while other follower agents use the control . In fact, by the representation of the cost functional (which is similar to the argument of major leader agent), to prove is an -Nash equilibrium for the follower agent, we only need to consider the perturbation satisfying
[TABLE]
where is a constant independent of . Then similar to Lemma 5.1, we can show that
[TABLE]
Lemma 5.7**.**
Under assumptions (H1)-(H2), and if the conditions in the Theorem 3.1, Theorem 3.2, Theorem 3.3 hold, then there exists a constant independent of and , such that
[TABLE]
where x^{(j,N_{f})}(t)=\frac{1}{N_{f}}\big{(}x_{j}(t)+\sum_{k\neq j}\overline{x}_{k}(t)\big{)}.
Proof. In fact, we have
[TABLE]
by (62) , it yields
[TABLE]
Combined with Lemma 5.2, we can directly get
[TABLE]
Lemma 5.8**.**
Under assumptions (H1)-(H2), and if the conditions in the Theorem 3.1, Theorem 3.2, Theorem 3.3 hold, for the follower agent’s perturbation control , we have
[TABLE]
Proof. Recall (6) and (11), we have
[TABLE]
By Hölder inequality and (62) there exists a constant independent of and such that
[TABLE]
At last, same as the Lemma 5.3, noting (63), (64), and Lemma 5.7, there exists a constant independent of and such that
[TABLE]
Taking the advantage of Lemma 5.3 and Lemma 5.8, we can give the last part of the proof to the Theorem 5.1, i.e. the control strategies set given by Theorem 5.1 is an -Nash equilibrium of the mixed S-MM-MFG for follower agent.
Part C of the proof to Theorem 5.1
Combining Lemma 5.3 and Lemma 5.8, we have
[TABLE]
where the second inequality comes from the fact that . Consequently, the Theorem 5.1 holds for the follower agent with \varepsilon=O\Big{(}\frac{1}{\sqrt{N}}\Big{)}. Finally, combined with the Part A, Part B, we complete the proof to Theorem 5.1.
6 Special Case
In this section, we will give an example to show how the major leader influences the whole system. We now look at a special case in which the major leader does not appear. In this case the problem is reduced to a leader-follower mean-field LQG game problem. Let us still regard it as if the major leader does appear but does not affect the game at all, i.e., we assume that
[TABLE]
Moreover, let , , , and . By observation, we can find that the coupled mean-field term between the leaders and the followers appear on the cost functional of the followers. Thus, it is truly a leader-follower mean-field LQG game problem. By the analysis above, we can get the CC equation of the special case as follows.
[TABLE]
Furthermore, we find that the equation of and is coupled together but decoupled with other equations and if the consistency condition equation admits a unique adapted solution then , and is the trivial solution to the equation. Then the CC equation of the special case is simplified as
[TABLE]
Next, by the decoupling for the open-loop strategy, we get the Riccati equations (LABEL:C6e18) and (LABEL:C6e19). However, it is still hard to get the explicit solution to the Riccati equations. So we consider the -dimensional example 5.1 as follows.
Example 5.1 Let , (66) holds, and let
[TABLE]
we have the state equation
[TABLE]
The cost functional reads
[TABLE]
and
[TABLE]
and the Riccati equation
[TABLE]
where
[TABLE]
so it can be easily solved out that
[TABLE]
And the optimal control , , subject to the optimal cost functional , and .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Andersson. D, Djehiche. B. A maximum principle for sdes of mean-field type. Applied Mathematics & Optimization , 64(2), 197-216 (2011).
- 2[2] Basar. T. Stochastic stagewise Stackleberg strategies for linear quadratic systems. Stochastic Control Theory and Stochastic Differential Systems . Springer Berlin Heidelberg (1979).
- 3[3] Bardi. M. Explicit solutions of some linear-quadratic mean field games. Networks & Heterogeneous Media , 7(2), 243-261 (2013).
- 4[4] Bensoussan. A, Chen. S, Sethi. S. The maximum principle for global solutions of stochastic stackelberg differential games. Social Science Electronic Publishing , 53(4), (2015).
- 5[5] Bensoussan. A, Chau. M, Yam. S. Mean field stackelberg games: aggregation of delayed instructions. SIAM. J. Control Optimal. , 53(4), 2237-2266 (2015).
- 6[6] Bensoussan. A, Chau. M, Yam. S. Mean field games with a dominating player. Applied Mathematics & Optimization , 74(1), 1-38 (2016).
- 7[7] Buckdahn. R, Djehiche. B, Li. J, Peng. S. Mean-field backward stochastic differential equations: a limit approach. Annals of Probability , 37(4), 1524-1565 (2009).
- 8[8] Bensoussan. A, Frehse. J, Yam. P. Mean field games and mean field type control theory. Springerbriefs in Mathematics (2013).
