A Game-Theoretic Framework for Autonomous Vehicles Velocity Control:   Bridging Microscopic Differential Games and Macroscopic Mean Field Games

Kuang Huang; Xuan Di; Qiang Du; Xi Chen

arXiv:1903.06053·math.OC·December 14, 2020

A Game-Theoretic Framework for Autonomous Vehicles Velocity Control: Bridging Microscopic Differential Games and Macroscopic Mean Field Games

Kuang Huang, Xuan Di, Qiang Du, Xi Chen

PDF

TL;DR

This paper introduces a mean field game framework for autonomous vehicle velocity control, enabling scalable, rational decision-making that bridges microscopic behaviors and macroscopic traffic flow, and improves traffic jam mitigation.

Contribution

It develops a systematic mean field game approach for AV velocity control, linking microscopic agent behaviors with macroscopic traffic models, and demonstrates improved traffic flow management.

Findings

01

MFG-based controller reduces traffic jams faster than LWR-based controller.

02

The framework provides a new traffic flow theory for autonomous vehicles.

03

Systematic micro-macro modeling of AV behaviors and traffic dynamics.

Abstract

This paper proposes an efficient computational framework for longitudinal velocity control of a large number of autonomous vehicles (AVs) and develops a traffic flow theory for AVs. Instead of hypothesizing explicitly how AVs drive, our goal is to design future AVs as rational, utility-optimizing agents that continuously select optimal velocity over a period of planning horizon. With a large number of interacting AVs, this design problem can become computationally intractable. This paper aims to tackle such a challenge by employing mean field approximation and deriving a mean field game (MFG) as the limiting differential game with an infinite number of agents. The proposed micro-macro model allows one to define individuals on a microscopic level as utility-optimizing agents while translating rich microscopic behaviors to macroscopic models. Different from existing studies on the…

Tables1

Table 1. Table 1: Classification of macroscopic traffic flow models

	Speed	Acceleration rate
Traditional	First-order (e.g., LWR)	Higher-order (e.g., PW/ARZ)
Game-theoretic	First-order MFGs	Higher-order MFGs

Equations144

\overset{x}{˙}_{i} (t) = v_{i} (t), x_{i} (0) = x_{i, 0}, i = 1, 2, \dots, N,

\overset{x}{˙}_{i} (t) = v_{i} (t), x_{i} (0) = x_{i, 0}, i = 1, 2, \dots, N,

v_{- i} (t) = [v_{1} (t), \dots, v_{i - 1} (t), v_{i + 1} (t), \dots, v_{N} (t)]^{T},

v_{- i} (t) = [v_{1} (t), \dots, v_{i - 1} (t), v_{i + 1} (t), \dots, v_{N} (t)]^{T},

x_{- i} (t) = [x_{1} (t), \dots, x_{i - 1} (t), x_{i + 1} (t), \dots, x_{N} (t)]^{T},

x_{- i} (t) = [x_{1} (t), \dots, x_{i - 1} (t), x_{i + 1} (t), \dots, x_{N} (t)]^{T},

J_{i}^{N} (v_{i}, v_{- i}) = running cost \int_{0}^{T} cost function f_{i}^{N} (v_{i} (t), x_{i} (t), x_{- i} (t)) d t + terminal cost V_{T} (x_{i} (T)),

J_{i}^{N} (v_{i}, v_{- i}) = running cost \int_{0}^{T} cost function f_{i}^{N} (v_{i} (t), x_{i} (t), x_{- i} (t)) d t + terminal cost V_{T} (x_{i} (T)),

A = {v (\cdot) : 0 \leq v (t) \leq u_{max}, \forall t \in [0, T]},

A = {v (\cdot) : 0 \leq v (t) \leq u_{max}, \forall t \in [0, T]},

J_{i}^{N} (v_{i}^{*}, v_{- i}) \leq J_{i}^{N} (v_{i}, v_{- i}), \forall v_{i} \in A .

J_{i}^{N} (v_{i}^{*}, v_{- i}) \leq J_{i}^{N} (v_{i}, v_{- i}), \forall v_{i} \in A .

J_{i}^{N} (v_{i}^{*}, v_{- i}^{*}) \leq J_{i}^{N} (v_{i}, v_{- i}^{*}), \forall v_{i} \in A, i = 1, \dots, N .

J_{i}^{N} (v_{i}^{*}, v_{- i}^{*}) \leq J_{i}^{N} (v_{i}, v_{- i}^{*}), \forall v_{i} \in A, i = 1, \dots, N .

ρ^{N} (x, t) = \frac{1}{N} j = 1 \sum N δ (x - x_{j} (t)),

ρ^{N} (x, t) = \frac{1}{N} j = 1 \sum N δ (x - x_{j} (t)),

ρ_{σ}^{N} (x, t) = \frac{1}{N} j = 1 \sum N ξ_{σ} (x - x_{j} (t)) .

ρ_{σ}^{N} (x, t) = \frac{1}{N} j = 1 \sum N ξ_{σ} (x - x_{j} (t)) .

f_{i}^{N} (v_{i} (t), x_{i} (t), x_{- i} (t)) ≜ f_{i} (v_{i} (t), ρ_{σ}^{N} (\cdot, t)),

f_{i}^{N} (v_{i} (t), x_{i} (t), x_{- i} (t)) ≜ f_{i} (v_{i} (t), ρ_{σ}^{N} (\cdot, t)),

f_{i} (v_{i} (t), ρ_{σ}^{N} (\cdot, t)) ≜ f_{i} (v_{i} (t), ρ_{σ}^{N} (x_{i} (t), t)),

f_{i} (v_{i} (t), ρ_{σ}^{N} (\cdot, t)) ≜ f_{i} (v_{i} (t), ρ_{σ}^{N} (x_{i} (t), t)),

J_{i}^{N} (v_{i}, v_{- i}) = \int_{0}^{T} f (v_{i} (t), ρ_{σ}^{N} (x_{i} (t), t)) d t + V_{T} (x_{i} (T)) .

J_{i}^{N} (v_{i}, v_{- i}) = \int_{0}^{T} f (v_{i} (t), ρ_{σ}^{N} (x_{i} (t), t)) d t + V_{T} (x_{i} (T)) .

J_{i}^{N} (v_{i}^{*}, v_{- i}^{*}) \leq J_{i}^{N} (v_{i}, v_{- i}^{*}), \forall v_{i} \in A, i = 1, \dots, N .

J_{i}^{N} (v_{i}^{*}, v_{- i}^{*}) \leq J_{i}^{N} (v_{i}, v_{- i}^{*}), \forall v_{i} \in A, i = 1, \dots, N .

J (v) = \int_{0}^{T} f (v (t), ρ (x (t), t)) d t + V_{T} (x (T)),

J (v) = \int_{0}^{T} f (v (t), ρ (x (t), t)) d t + V_{T} (x (T)),

\overset{x}{˙} (t) = v (t), x (0) = x_{0},

\overset{x}{˙} (t) = v (t), x (0) = x_{0},

0 \leq v (t) \leq u_{max}, \forall t \in [0, T] .

0 \leq v (t) \leq u_{max}, \forall t \in [0, T] .

V (x, t)

V (x, t)

s.t. \overset{x}{˙} (s)

v^{*} (t)

v^{*} (t)

\overset{x}{˙}^{*} (t)

\int_{t}^{T} f (v (s), ρ (x (s), s)) d s + V_{T} (x (T))

\int_{t}^{T} f (v (s), ρ (x (s), s)) d s + V_{T} (x (T))

=

\int_{t}^{t + Δ t} f (v (s), ρ (x (s), s)) d s = f (α, ρ (x, t)) Δ t + O (Δ t^{2}) .

\int_{t}^{t + Δ t} f (v (s), ρ (x (s), s)) d s = f (α, ρ (x, t)) Δ t + O (Δ t^{2}) .

V (x, t) = min_{0 \leq α \leq u_{max}} {f (α, ρ (x, t)) Δ t + V (x + α Δ t, t + Δ t) + O (Δ t^{2})} .

V (x, t) = min_{0 \leq α \leq u_{max}} {f (α, ρ (x, t)) Δ t + V (x + α Δ t, t + Δ t) + O (Δ t^{2})} .

V (x, t) = min_{0 \leq α \leq u_{max}}

V (x, t) = min_{0 \leq α \leq u_{max}}

+ O (Δ t^{2})} .

V_{t} + min_{0 \leq α \leq u_{max}} {f (α, ρ) + α V_{x}} = 0.

V_{t} + min_{0 \leq α \leq u_{max}} {f (α, ρ) + α V_{x}} = 0.

f^{*} (p, ρ) = min_{0 \leq α \leq u_{max}} {f (α, ρ) + α p}, \forall p \in R,

f^{*} (p, ρ) = min_{0 \leq α \leq u_{max}} {f (α, ρ) + α p}, \forall p \in R,

V_{t} + f^{*} (V_{x}, ρ) = 0.

V_{t} + f^{*} (V_{x}, ρ) = 0.

u = 0 \leq α \leq u_{max} argmin {f (α, ρ) + α V_{x}} = f_{p}^{*} (V_{x}, ρ) .

u = 0 \leq α \leq u_{max} argmin {f (α, ρ) + α V_{x}} = f_{p}^{*} (V_{x}, ρ) .

V_{t} + f^{*} (V_{x}, ρ) = 0,

V_{t} + f^{*} (V_{x}, ρ) = 0,

u = f_{p}^{*} (V_{x}, ρ) .

\mbox (C E) ρ_{t} + (ρ u)_{x} = 0,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Please cite this paper as: Huang, Kuang, et al. “A game-theoretic framework for autonomous vehicles velocity control: Bridging microscopic differential games and macroscopic mean field games.” Discrete & Continuous Dynamical Systems-B 22.11 (2017): 0.

A Game-Theoretic Framework for Autonomous Vehicles Velocity Control: Bridging Microscopic Differential Games and Macroscopic Mean Field Games

Kuang Huang

Xuan Di

[email protected]

Qiang Du

Xi Chen

Department of Applied Physics and Applied Mathematics, Columbia University

Department of Civil Engineering and Engineering Mechanics, Columbia University

Data Science Institute, Columbia University

Department of Computer Science, Columbia University

Abstract

This paper proposes an efficient computational framework for longitudinal velocity control of a large number of autonomous vehicles (AVs) and develops a traffic flow theory for AVs. Instead of hypothesizing explicitly how AVs drive, our goal is to design future AVs as rational, utility-optimizing agents that continuously select optimal velocity over a period of planning horizon. With a large number of interacting AVs, this design problem can become computationally intractable. This paper aims to tackle such a challenge by employing mean field approximation and deriving a mean field game (MFG) as the limiting differential game with an infinite number of agents. The proposed micro-macro model allows one to define individuals on a microscopic level as utility-optimizing agents while translating rich microscopic behaviors to macroscopic models. Different from existing studies on the application of MFG to traffic flow models, the present study offers a systematic framework to apply MFG to autonomous vehicle velocity control. The MFG-based AV controller is shown to mitigate traffic jam faster than the LWR-based controller. MFG also embodies classical traffic flow models with behavioral interpretation, thereby providing a new traffic flow theory for AVs.

keywords:

Autonomous vehicles Control , Mean field game , Differential game , Micro-Macro limit , $\epsilon$ -Nash equilibrium

††journal: DCDS-B

1 Introduction

1.1 Problem statement

When all human-driven vehicles (HVs) on public roads are replaced by autonomous vehicles (AVs), AVs’ control strategy will be different from human driving behavior and their traffic flow will be different from what we observe nowadays. In this paper we would like to understand two questions:

What is the new car-following control strategy of AVs at micro-scale? 2. 2.

What is the new traffic flow theory for AVs at macro-scale?

Human driving behavior has been extensively modeled at both micro- and macro-scales. At micro-scale, car following models (CFMs) treat each vehicle as a discrete entity with a constant length, whose dynamic location and velocity is computed from an underlying ordinary differential equation (ODE) system [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. CFMs assume local interactions among vehicles and local information from neighboring vehicles. The modeling of each agent requires tracking and keeping records of surrounding agents. Due to dynamic and volatile characteristics of traffic flow, the interacting agents and their topology may change quickly. Real time design strategies may become extremely difficult to implement for heavy traffic scenarios, as the associated microscopic control mechanism may not be scalable for many vehicles. Moreover, it is also not easy to account for global traffic information obtained from vehicle connectivity. In contrast, macroscopic traffic flow models treat one vehicle as a particle without occupying any space. Traffic flow is then described by the continuum density distribution and velocity field solved from partial differential equations (PDEs) [1, 19, 20, 21, 22, 23, 24, 25].

AVs, on the other hand, exhibit distinct driving behavior from HVs and thus call for new models and theories for both microscopic vehicle control and macroscopic traffic flow.

1.1.1 Microscopic Longitudinal Controller of AVs

To prepare AVs to drive on public roads, safe and efficient controller design of autonomous driving is a top priority. AV controls can be categorized into longitudinal control (i.e., the car-following scenario) and lateral control (i.e., the lane-change scenario). Longitudinal control has been studied in various scenarios, including: platooning [17, 26, 27, 28], speed harmonization [29, 30, 31], longitudinal trajectory optimization [27, 32], and eco-approach and departure at signalized intersections [33, 34, 35]. Connected adaptive cruise control (CACC) is the most extensively studied longitudinal controller for AV platooning [36, 37, 38, 39, 40, 41, 42, 43, 44, 41, 45, 46, 47, 48, 49, 27, 32].

This paper is primarily focused on AVs’ longitudinal velocity control in the car-following scenario. We formally formulate the problem as follows.

Definition 1.1.

(Problem Statement): There are $N$ autonomous vehicles indexed by $i\in\{1,2,\dots,N\}$ driving in one direction on a closed highway without any entrance nor exit, with initial positions $x_{1,0},\dots,x_{N,0}$ . Each car aims to select its optimal velocity control by minimizing its driving cost functional pre-programmed by its manufacturer over the predefined planning horizon $\left[0,T\right]$ . We would like to investigate a scalable velocity control strategy for a large number of AVs.

The modeling details will be discussed in next sections. To develop microscopic AV controls, one needs to design autonomous driving behavior by some underlying dynamical models for AVs. A majority of studies simply tailor AVs’ behavior on that of HVs by tweaking behavioral parameters (e.g., shorter reaction time or headway [50] or k-vehicle ahead information [49]), in which AVs are essentially human drivers but react faster, “see” farther, and “know” the road environment better. The models proposed in those studies may not capture AVs’ dynamic learning capabilities. Such learning capabilities are modeled by the model predictive control (MPC) or Stackelberg games in some studies [51, 52]. However, those studies suffer from scalability issues when the number of AVs becomes large.

1.1.2 Macroscopic Traffic Flow Theory for AVs

The connections between CFMs and macroscopic traffic flow models have been established either using continuum limit or via change of coordinates, states or formulations. In the former case, a macroscopic model is the limit of a CFM as the number of cars tends to infinity, which may be shown rigorously using theory from conservation laws [53, 54] or measure theory [55]. However, such continuum limit results are only known for very few CFMs while the general mathematical theories are still not available. As an alternative, a macroscopic model can be transformed into different coordinates, different state variables or its variational formulation, so that consistency can be established between the transformed systems and specific CFMs. For instance, [56, 57] showed that the Newell’s CFM is a discrete form of the LWR model with the Greenshields fundamental diagram in Lagrangian coordinates. Along this line, [58] studied a nonlocal second order model and [59] studied three representations of the LWR model using different coordinates and state variables. Later, [60] established a more general framework bridging between a family of CFMs and macroscopic traffic flow models.

Similarly, deriving a new traffic flow theory for pure AVs requires the establishment of a micro-macro connection from AV control models. There exist limited studies that characterized traffic flow theories from their respective microscopic controls of connected and automated vehicles using gas-kinetic theory [61, 62]. In contrast, a majority of researchers simply derived new fundamental diagrams using the existing traffic flow theory framework: Assuming that AVs posses shorter reaction time in car-following, AVs’ fundamental diagram has the same free-flow speed but steeper congestion curve [50] compared to HVs.

In this paper, we aim to derive a macroscopic game-theoretic model from AV microscopic longitudinal control and fill two gaps in the existing literature by (i) proposing an efficient computational framework for the longitudinal control of a platoon of AVs in the car-following scenario and (ii) developing a traffic flow theory for AVs.

Traditional macroscopic traffic flow models are often classified into two categories: first-order models such as the Lighthill-Whitham-Richards (LWR) model [63, 20] and higher-order models such as the Payne-Whitham (PW) model [64, 65] and the Aw-Rascle-Zhang (ARZ) model [66, 22]. The classification is based on different control variables. First-order models assume drivers control their speeds according to the traffic density while higher-order models prescribe a relationship between drivers’ acceleration rates and the traffic density as well as drivers’ speeds. The mean field game presented in this paper, which only models AVs’ velocity controls, may be seen as the game-theoretic analogue to traditional first-order models. Similar to the extensions taken from traditional first-order models to higher-order models, one can incorporate more factors such as the acceleration rate and develop higher-order MFGs into our game-theoretic modeling framework. Table 1 shows the classification of both traditional and game-theoretic macroscopic traffic flow models. This paper primarily focuses on first-order MFGs and will leave the discussion on higher-order MFGs in future research.

1.2 Literature review

Assuming connectivity between predecessors and followers as well as between platoon leaders and followers, CACC contains two control policies: constant spacing (CS) [36, 37, 67] and constant time headway (CTH) [38, 39, 40, 41, 42, 26, 31, 68]. These two policies can be formulated as a linear time invariant system (LTI) [69] with disturbances to dynamic and measurement dynamics [26] or a model predictive control (MPC) system with distributed control [70, 17, 18].

AVs longitudinal acceleration control can also be modeled using nonlinear car following models (CFMs). The most widely used CFMs for AVs are Intelligent Driver Model (IDM) [71, 72, 43, 44, 41, 45, 46, 47, 48, 10, 73, 74, 75, 11, 12, 13, 14] and Optimal Velocity Model (OVM) and its variants with heterogeneous communication delay or dynamic uncertainty [76, 49, 77, 78, 79]. Unlike OVM, IDM takes safety into consideration and is thus collision-free. All the aforementioned studies aim to develop a string stable car-following controller in order to smoothen traffic flow and prevent stop-and-go waves. But none of them considers control and physical safety constraints [18]. In other words, interactions among vehicles are not explicitly modeled [28].

To address the above challenges, some researchers model a full penetration of AVs as a multi-agent system (MAS), wherein every AV interacts among one another through physical interactions in traffic. A majority of studies that capture the interactions of vehicles assume that each vehicle carries out a sequence of accelerations over a finite time horizon by optimizing a common or an individual objective function. Vehicles interact among themselves through the common or individual cost function as well as safety constraints. Depending on the objective functional form, these models can be further divided into two classes: cooperative control and non-cooperative game.

Cooperative control has been widely studied in multi-robotic systems. In light of multi-robotic-interaction, robots interact with one another and choose optimal policies by predicting others behavior. Neighboring robots trajectories are treated as hard safety constraints or boundaries for robots motion planning. Such modeling has been critical in multi-robot collision avoidance and human-robot interaction [15, 16]. A cooperative AV system is a multi-vehicle system that can be controlled to stabilize traffic flow and smoothen traffic jam [70, 17, 18], to optimize driving comfort [80, 26], and to improve fuel efficiency [81, 35]. To reduce computational burdens, a distributed algorithm is usually designed and implemented on each vehicle [70, 17, 18].

Compared to cooperative AV control, the non-cooperative interactions among

AVs are relatively understudied. Game theory is a natural approach to model the non-cooperative strategic interactions among AVs, in which each AV solved an MPC [51] or MDP [82, 83, 84, 85]. In the game theoretic framework, cars are referred to as “agents” or “players”. [51] formulated the discrete lane change and continuous acceleration selection of AVs as a differential game, where the agents’ optimal strategies are obtained from solving optimal control problems. [86] modeled lane-changing behavior as a two-person non-zero-sum non-cooperative game under (in)complete information. [87, 88, 89, 90] developed a Stackelberg game among multiple AVs in driving or merging and a mixed-motive game in lane-changing. [82] modeled multiple AVs acceleration and steering angle velocity selection at intersections with the goal of avoiding collisions. The human-cyber-physical systems (h-CPS) community extends multi-agent systems to hybrid AVs interacting with human drivers. For example, [84] designed “local interactions” between an AV and a human driver to drive efficiently and maximize road capacity, while [85] generalized the model to several AVs and HVs. [83] assumed that human drivers choose driving policies using hierarchical reasoning while AVs optimize car-following and lane-changing strategies based on a Stackelberg game. The outcomes of all the aforementioned game-theoretic models are equilibrium driving strategies. The computation of equilibrium may become extremely challenging when the number of coupled agents becomes large. To get around, [51] applied Model Predictive Control (MPC) instead of computing an equilibrium. [82] solved a generalized Nash equilibrium by summing up all vehicles objective functions, which is essentially a cooperative control. [85] assumed that AVs can directly perform optimization based upon their predictions of human driver actions rather than human’s actual strategies. None of these studies investigated quantitatively how close the approximate solutions are to the original differential games. Nevertheless, because the available game-based control algorithms suffer from scalability issues, all the aforementioned studies had to constrain their applications to a limited number of AVs. A scalable traffic simulation framework was developed where AVs learned their optimal driving policies in a multi-agent RL environment [73, 91, 92], but the trained policies suffered from a lack of interpretability.

All the aforementioned studies focus on AVs longitudinal or lateral controls in discrete games, which suffers from scalability issues. Thus a scalable theory and algorithm applicable for a large number of coupled AV controllers is urgently needed.

1.3 Contributions of this paper

Instead of hypothesizing explicitly how AVs drive, our goal is to design future AVs as rational, utility-optimizing agents that play best driving strategies. In other words, AVs are intelligent agents programmed by manufacturers to minimize driving costs as a trade-off between traffic safety and efficiency. Any deviation from driving with the best strategies will increase AVs’ individual costs.

Game theory is a natural tool to model the equilibrium of interacting utility-optimizing agents. Given that a large number of interacting AVs are designed to select velocity controls continuously, we seek an innovative game-theoretic tool, i.e., the mean field game, for complex multi-agent dynamic modeling [93, 94]. Mean field approximation allows for the translation of microscopic behaviors and interactions of agents to a macroscopic level. Most importantly, we will show later in this paper that MFG embodies classical traffic flow models with behavioral interpretation, thereby providing a new traffic flow theory for AVs.

2 The modeling framework

This paper contributes to the state-of-the-art of AV controller design by characterizing the interplay between discrete differential games and continuous mean field games. Mean field game (MFG) is a game-theoretic model used to describe complex multi-agent dynamic systems [93, 94]. It has become increasingly popular in designing new decision-making processes for finance [95, 96], engineering [97, 98], social science [99], pedestrian crowds modeling [100, 101] and traffic [102, 103, 104, 105]. MFG is a micro-macro model which allows one to define individuals on a microscopic level as rational utility-optimizing agents while translating rich microscopic behaviors to macroscopic models. The basic idea is to exploit the “smoothing” effect of large numbers of interacting individuals. Instead of solving a long list of highly coupled equations that depict the interactions among different players, MFG assumes that each player only reacts to a “mass” that results from an aggregate effect of all the players. Such an approach is called mean field approximation and helps to simplify the complex multi-agent dynamic systems on a macroscopic level.

This paper models AVs’ rational and intelligent driving behavior under the mean field game framework. We shall make the following AVs’ behavioral assumptions:

•

Each AV observes global in space traffic state information on the road.

•

Each AV plans its velocity control in a time horizon by anticipating others’ behaviors.

•

AVs act to utilize their predefined driving costs over the time horizon in a non-cooperative way.

Four major components of this paper are elaborated below (as shown in Figure 1):

A mean field game is derived from the limiting differential game as the number of AVs tends to infinity. The mean field game is a coupled forward-backward PDE system that models AVs’ non-cooperative velocity selections at a macroscopic scale. The existing research on the application of mean field games to transportation domain solely worked on specific objective functions [103, 102]. In contrast, we systematically derive the forward continuity equation and the backward Hamilton-Jacobi-Bellman (HJB) equations with a family of more general objective functions using mean field approximation. 2. 2.

An equilibrium solution, denoted by mean field equilibrium (MFE), is solved from the mean field game. AVs’ optimal velocity control strategies are represented by the MFE at a macroscopic level. Existing algorithms for computing the MFE are mainly designed for a short planning horizon [103] or a special family of cost functions [100, 106]. In this paper we develop a new algorithm that works with a longer planning horizon and more general cost functions. Our algorithm is based on finite difference and multigrid preconditioned Newton’s method. 3. 3.

A tuple of AVs’ discrete controls are constructed from the discretization of a continuous MFE. We test different numbers of AVs and different objective functions to illustrate the accuracy of MFE-constructed controls as an $\epsilon$ -Nash equilibrium of the original differential game. The results show a consistent trend that the continuous equilibrium solution provides a good approximation to AVs’ non-cooperative individual controls when the number of AVs is large. This construction method addresses the scalability issue faced by many existing literature [51, 82, 85]. 4. 4.

The proposed mean field game can also be treated as a macroscopic traffic flow model. It models AVs’ aggregated behavior assuming AVs are predictive and rational agents. Along this line, we first establish connections between the mean field game and the traditional LWR model rigorously. Then we present some possible AV driving objective functions whose respective mean field games show interesting traffic patterns.

The remainder of this paper is organized as follows. Section 3 introduces AVs’ differential game as an extension to one AV’s optimal longitudinal control problem. In Section 4, the macroscopic MFG is derived from AVs’ differential game with some assumptions. In Section 5, we illustrate connections between MFG and the traditional LWR model in a general framework and present two MFG examples modeling AVs’ kinetic energy, driving efficiency and safety. Then, Section 6 is devoted to a new algorithm to solve MFG numerically based on Newton’s method. In Section 7, we construct a tuple of AVs’ discrete controls from the continuous MFG solution and characterize their accuracy as an approximate equilibrium of the original differential game. Conclusions and future research directions follow in Section 8.

3 From optimal control to differential game

We have seen a growing interest in applying optimal control theory to model AVs’ predictive driving strategies in car-following and lane-change scenarios [81, 80, 17, 26]. In this section, we briefly introduce how to formulate a single AV’s longitudinal control as an optimal control problem and then extend it to a differential game among multiple AVs.

3.1 Optimal longitudinal control of one car

Assume that there are $N$ AVs indexed by $i\in\{1,2,\dots,N\}$ driving in one direction on a closed highway of length $L$ without any entrance nor exit. Denote the $i^{\text{th}}$ car’s position at time $t$ by $x_{i}(t)$ and speed by $v_{i}(t)$ . Fix a finite period of time $[0,T]$ where $T>0$ , the cars’ motions on $[0,T]$ are dictated by the following dynamical system:

[TABLE]

where,

$\dot{x}_{i}(t)$ : the shorthand notation of $\frac{dx_{i}(t)}{dt}$ ;

$x_{i,0}$ : the $i^{\text{th}}$ car’s initial position at the beginning time $t=0$ .

We use the notation $x_{i}(t)=x_{i}(t,v_{i}(\cdot),x_{i,0})$ , $i=1,2,\dots,N$ for simplicity but keep in mind that $x_{i}(\cdot)$ depends on both $v_{i}(\cdot)$ and $x_{i,0}$ .

For any $i=1,2,\dots,N$ , suppose the $i^{\text{th}}$ car knows other cars’ speeds:

[TABLE]

and positions:

[TABLE]

for $t\in[0,T]$ . To select an optimal driving speed profile, the $i^{\text{th}}$ car solves an optimal control problem over the planning horizon $[0,T]$ .

Define the $i^{\text{th}}$ car’s driving cost functional as:

[TABLE]

where,

$\int_{0}^{T}f_{i}^{N}\left(v_{i}(t),x_{i}(t),\bm{x}_{-i}(t)\right)\,dt$ : the running cost over the entire planning horizon;

$f_{i}^{N}(\cdot)$ : the cost function that quantifies driving objectives such as efficiency and safety;

$V_{T}(x_{i}(T))$ : the terminal cost representing the $i^{\text{th}}$ car’s preference on its final position at time $T$ .

We assume that all cars have the same free flow speed denoted by $u_{\text{max}}$ . It is natural to require that the $i^{\text{th}}$ car’s speed remains nonnegative and does not exceed $u_{\text{max}}$ . Mathematically, this means that

[TABLE]

is the admissible set of the $i^{\text{th}}$ car’s speed selections. The $i^{\text{th}}$ car tries to obtain an optimal velocity control $v_{i}^{*}(\bm{v}_{-i}(\cdot),t)$ on the planning horizon $[0,T]$ such that:

[TABLE]

$v_{i}^{*}(\bm{v}_{-i}(\cdot),t)$ depends on other cars’ speeds $\bm{v}_{-i}(\cdot)$ through their trajectories $\bm{x}_{-i}(\cdot)$ . We will use the notation $v_{i}^{*}(t)$ for simplicity. When one car selects its own driving speed over the predefined planning horizon while everybody else does so simultaneously, a non-cooperative differential game forms.

3.2 N-Car differential game

Differential games can be regarded as extensions of non-cooperative Nash games in dynamic systems. In a differential game, a finite number of players solve their individual optimal control problems while those optimal control problems are coupled through the dependency of one’s cost functional on the others’ actions [107]. Along this line, we formulate the $N$ -car differential game for AVs extending the one-car optimal control problem in Section 3.1:

$N$ AVs indexed by $i\in\{1,2,\dots,N\}$ are driving in one direction on a closed highway without any entrance nor exit, with initial positions $x_{1,0},\dots,x_{N,0}$ . Each car aims to select its optimal velocity control by minimizing its driving cost functional defined in Eq. (3.4) over the predefined planning horizon $\left[0,T\right]$ . A Nash equilibrium of the game is a tuple of controls $v^{*}_{1}(t),v^{*}_{2}(t),\dots,v^{*}_{N}(t)$ satisfying:

[TABLE]

It is generally difficult to solve an equilibrium when $N$ is large, because it involves solving $N$ coupled optimal control problems [108]. The goal of this paper is to develop a scalable framework to solve approximate equilibria for a family of $N$ -car differential games by resorting to mean field approximation.

The underlying rationale of the developed methodology is articulated as follows: **

Rather than solving the $N$ -car differential game directly, we turn to its limit as the number of cars $N\to\infty$ , i.e., a mean field game (Section 4). 2. 2.

A numerical algorithm is developed to solve the mean field game (Section 6). 3. 3.

The equilibrium solution of the mean field game is used to construct a tuple of discrete controls and those controls are verified to be an $\epsilon$ -Nash equilibrium of the original $N$ -car differential game by numerical examples (Section 7).

4 From differential game to mean field game

When the number of cars $N\to\infty$ , one goes from the $N$ -car differential game to a mean field game (MFG). The MFG is essentially a differential game with an infinite number of agents so that the interactions between any two individuals are ignorable. Instead, any individual reacts only to the “mass” of all agents. The “mass” then evolves with the aggregated behavior of all agents’ motions. Two partial differential equations are developed to describe the MFG:

A backward Hamilton-Jacobi-Bellman (HJB) equation: a generic car’s speed selection is formulated as an optimal control problem where the generic car computes its driving cost associated to a cost function based on its prediction on the evolution of total “mass”. The HJB equation is then derived from the optimal control problem. The solution of the HJB equation provides optimal costs and optimal velocity control strategies for all cars. The HJB equation is solved from $t=T$ to $t=0$ backward. 2. 2.

A forward continuity equation: it is derived from the conservation of cars. The solution of the continuity equation describes the “mass” evolution arising from all cars’ motions. The continuity equation is solved from $t=0$ to $t=T$ forward.

The MFG is a coupled system of the forward continuity equation and the backward HJB equation. At the mean field equilibrium, the total “mass” evolution coincides every car’s prediction. Figure 2 shows a simple example of four cars to provide an intuitive explanation of these two equations. Each car is a rational agent aiming to minimize a driving cost defined in Eq. (3.4), leading to a system of four coupled optimal control problems, one for each car. As $N$ goes large, the HJB equation can be derived from these coupled problems and the continuity equation can be derived from the trajectories of all cars.

In this section, we will formally derive the HJB and continuity equations from the $N$ -car differential game using mean field approximation.

4.1 Mean field limit

The general idea of moving from the microscopic $N$ -car differential game to the macroscopic MFG is to take a mean field limit by letting the number of cars in the system go to infinity. To allow us to take the limit, we need to first make two homogeneity assumptions:

(A1)

All cars are indistinguishable. 2. (A2)

All cars have the same form of cost function.

It should be mentioned that the above assumptions may be relaxed. For example, (A2) can be relaxed if multi-class traffic is the subject of study. In this paper we mainly focus on single-class AVs and leave multi-class models to the future work.

Provided that the $N$ -car differential game satisfies the above assumptions, we will derive a MFG in four steps:

We reformulate the driving cost functional defined in Eq. (3.4) by introducing a smooth density (Section 4.1.1); 2. 2.

We derive a generic car’s optimal control problem from the differential game by taking the mean field limit when $N\to\infty$ (Section 4.1.2); 3. 3.

We derive a set of HJB equations from the generic car’s optimal control problem (Section 4.1.3); 4. 4.

We obtain an evolution equation and show that it is exactly the continuity equation widely used in macroscopic traffic flow models (Section 4.1.4).

4.1.1 Step 1: Traffic Density Smoothing

Traffic density is a crucial quantity to manifest the macroscopic aspect of traffic flow. Assumption (A1) enables us to replace states of individual cars in the driving cost functional defined in Eq. (3.4) by an aggregated traffic density.

More precisely, for any $i=1,\dots,N$ , assumption (A1) implies that the function $f_{i}^{N}\left(v_{i}(t),x_{i}(t),\bm{x}_{-i}(t)\right)$ does not depend on the permutation of $x_{1}(t),x_{2}(t),\dots,x_{N}\break(t)$ . According to [108], we can replace the arguments $x_{i}(t),\bm{x}_{-i}(t)$ in $f_{i}^{N}\left(v_{i}(t),x_{i}(t),\right.$$\left.\bm{x}_{-i}(t)\right)$ by an empirical density distribution of $x_{1}(t),x_{2}(t),\dots,x_{N}(t)$ , which is defined as:

[TABLE]

where $\delta(\cdot)$ is the Dirac mass.

However, $\rho^{N}$ is not a smooth function, leading to non-smoothness of the new driving cost functional. To resolve this issue, we first approximate $\rho^{N}$ using a smoothing kernel.

Suppose that $\xi(x)$ is a smoothing kernel which is smooth and nonnegative, and satisfies $\int_{\mathbb{R}}\xi(x)\,dx=1$ . We take a smoothing parameter $\sigma>0$ and define the scaled kernel $\xi_{\sigma}(x)=\frac{1}{\sigma}\xi(\frac{x}{\sigma})$ . The physical meaning of using the scaled kernel $\xi_{\sigma}(x)$ is that the $i^{\text{th}}$ car contributes to the density in a “window” $[x_{i}(t)-\sigma,x_{i}(t)+\sigma]$ rather than only at the point $x_{i}(t)$ $(i=1,\dots,N)$ so that the density changes smoothly with location $x$ . The smooth density distribution is defined as:

[TABLE]

With the smooth density, the $i^{\text{th}}$ car’s cost function is rewritten as:

[TABLE]

where $v_{i}(t)$ is the car’s speed and $\rho^{N}_{\sigma}(\cdot,t)$ is the traffic density over the whole road at time $t$ . Generally $f_{i}$ may have arbitrary dependence on $\rho^{N}_{\sigma}(\cdot,t)$ as well as its spatial derivatives. To simplify, we make the assumption:

(A3)

The cost function only depends on the traffic density at the car’s position.

By assumption (A3) we can write:

[TABLE]

where $f_{i}(\cdot,\cdot)$ is a bivariate function of speed and density, $i=1,2,\dots,N$ .

By assumption (A2), we have $f_{1}=f_{2}=\cdots=f_{N}=f$ , where $f(\cdot,\cdot)$ is a bivariate cost function shared by all cars. In summary, the $i^{\text{th}}$ car’s driving cost becomes:

[TABLE]

It should be noted that the density information ahead of and behind the $i^{\text{th}}$ car is asymmetric. At any time $t_{1}\in[0,T)$ , the car anticipates the model predicted density $\rho_{\sigma}^{N}(x_{i}(t_{2}),t_{2})$ for a later time $t_{2}\in(t_{1},T]$ to select its driving speed at time $t_{1}$ . Since the $i^{\text{th}}$ car drives at positive speeds, it always holds that $x_{i}(t_{2})\geq x_{i}(t_{1})$ . The fact yields that the cars ahead of the $i^{\text{th}}$ car may contribute to the density $\rho_{\sigma}^{N}(x_{i}(t_{2}),t_{2})$ but never will the cars behind the $i^{\text{th}}$ car do so. Consequently, the $i^{\text{th}}$ car’s velocity control is not influenced by the cars behind it.

Definition 4.1.

$N$ -car mean field type differential game [DG]:

$N$ AVs indexed by $i\in\{1,2,\dots,N\}$ are driving in one direction on a closed highway of length $L$ without any entrance nor exit, with initial positions $x_{1,0},\dots,x_{N,0}$ . Each car aims to select its optimal velocity control by minimizing its driving cost functional defined in Eq. (4.5) over the predefined planning horizon $\left[0,T\right]$ . 2. 2.

$N$ -car mean field type differential game equilibrium [DGE]:

A Nash equilibrium of the $N$ -car mean field type differential game is a tuple of controls $v^{*}_{1}(t),v^{*}_{2}(t),\dots,v^{*}_{N}(t)$ satisfying:

[TABLE]

At equilibrium, no car can improve its driving cost by unilaterally switching its velocity control.

We see from Eq. (4.5) that each car only responds to and contributes to the density distribution $\rho^{N}_{\sigma}$ of all cars through driving costs. Such a property allows us to take the mean field limit of the game as $N$ tends to infinity.

4.1.2 Step 2: Optimal Control of a Generic Car

We take the mean field limit in the following way: fix the ratio $L/N$ , let $N\to\infty$ and $\sigma/L\to 0$ . Intuitively that means we fix the space headway and shrink the “window” so that in the limiting case one car only sees a local density. Under the limit, using mean field approximation we replace $\rho_{\sigma}^{N}(x,t)$ that is computed from $N$ cars’ positions by a continuum density distribution $\rho(x,t)$ . Note that all cars are anonymous, we can ignore the index $i$ and consider a generic car starting from $x_{0}$ at $t=0$ . Denote the car’s velocity control by $v(t)$ and trajectory by $x(t)$ for $t\in[0,T]$ , we rewrite Eq. (4.5) as:

[TABLE]

where its dynamic motion is described by

[TABLE]

and its velocity control $v(\cdot)$ is constrained by

[TABLE]

4.1.3 Step 3: HJB Equation

The generic car solves the optimal control problem Eqs. (4.7)(4.8)(4.9) to obtain its optimal velocity control $v^{*}(t)$ , which depends on the generic car’s initial position $x_{0}$ . The initial position $x_{0}$ can be any position on the road. Rather than solving an infinite number of optimal control problems for every initial position, we use dynamic programming and derive a set of HJB equations that characterize the optimality condition of the velocity control. Such an approach is widely used in optimal control theory [110].

We first introduce the Bellman value function $V(x,t)$ and the optimal velocity field $u(x,t)$ . $V(x,t)$ is defined as the optimal cost for the generic car starting from location $x$ at time $t$ :

[TABLE]

and $u(x,t)$ is defined as the car’s speed at location $x$ and time $t$ when choosing the optimal control of Eqs. (4.10a)(4.10b).

From another point of view, $v^{*}(t)$ is the Lagrangian optimal velocity control while $u(x,t)$ is the Eulerian optimal velocity field. Once $u(x,t)$ is solved for all $x$ and $t$ , the optimal cost of the original problem Eqs. (4.7)(4.8)(4.9) is given by $V(x_{0},0)$ and the optimal control $v^{*}(t)$ is given by the feedback law:

[TABLE]

Then we derive the HJB equations for $V(x,t)$ and $u(x,t)$ from Eqs. (4.10a)(4.10b). Suppose the generic car starts from position $x$ at time $t$ . Consider a small time step $\Delta t$ , we can divide the driving cost in Eq. (4.10a) into two parts on $[t,t+\Delta t]$ and $[t+\Delta t,T]$ :

[TABLE]

Correspondingly, the generic car’s decision process is also divided into two stages. First it selects the speed $v(t)=\alpha\in[0,u_{\text{max}}]$ on the horizon $[t,t+\Delta t]$ . Then it moves to $x+\alpha\Delta t$ at time $t+\Delta t$ and selects its speed profile over the rest of the planning horizon $[t+\Delta t,T]$ .

The running cost on $[t,t+\Delta t]$ is approximated by

[TABLE]

Note that from the new position $x+\alpha\Delta t$ , the optimal cost on $[t+\Delta t,T]$ the car can obtain is $V(x+\alpha\Delta t,t+\Delta t)$ . By dynamic programming principle we have:

[TABLE]

Take the first order Taylor’s expansion of $V(x+\alpha\Delta t,t+\Delta t)$ near $(x,t)$ , denote $V_{t}$ and $V_{x}$ the partial derivatives $\frac{\partial V}{\partial t}$ and $\frac{\partial V}{\partial x}$ . Eq. (4.15) yields:

[TABLE]

Eliminating $V(x,t)$ from both sides, dividing both sides by $\Delta t$ and letting $\Delta t\to 0$ , we get:

[TABLE]

We assume that $f$ is strictly convex with respect to its first argument, the driving speed. Then we can introduce:

[TABLE]

so that $-f^{*}(-\cdot,\rho)$ is the Legendre transformation of $f(\cdot,\rho)$ for any $\rho$ . Using $f^{*}$ , Eq. (4.17) can be rewritten as:

[TABLE]

The strict convexity of $f$ with respect to speed yields the uniqueness of the minimizer in Eq. (4.18) that is given by $f^{*}_{p}(p,\rho)$ for any $p\in\mathbb{R}$ , where $f_{p}$ denotes $f$ ’s derivative with respect to $p$ . As a result, the optimal velocity field $u(x,t)$ is given by:

[TABLE]

We highlight the convexity assumption on $f$ because the car’s optimal speed selection is not unique even in a single time step without the assumption. In real applications, it is reasonable to assume AVs’ utility satisfies the law of diminishing marginal returns [111, 112], i.e., increasing speed results in smaller increase in utility. As a corollary to the law, the utility should be concave with respect to speed. Then the convexity assumption follows from the fact that AVs’ driving cost is just the negative of the utility.

When $t=T$ , Eq. (4.10a) becomes $V(x,T)=V_{T}(x)$ , which gives the terminal condition of the HJB equations.

Summarizing all above, given the density distribution $\rho(x,t)$ , Eqs. (4.10a)(4.10b) lead to the following HJB equations:

[TABLE]

with $V(x,T)=V_{T}(x)$ . $V(x,t)$ and $u(x,t)$ are solved backward from the HJB equations.

4.1.4 Step 4: Continuity Equation

When all cars follow the optimal velocity control, the aggregated density distribution $\rho(x,t)$ evolves according to the optimal velocity field $u(x,t)$ obtained from the HJB equations. An evolution equation can be derived from the conservation of cars:

[TABLE]

to describe the evolution of density $\rho(x,t)$ from some initial density distribution $\rho(x,0)=\rho_{0}(x)$ . Eq. (4.22) is exactly the continuity equation (CE) widely used in traffic flow models [9]. Given velocity field $u(x,t)$ known, Eq. (4.22) is solved forward.

4.2 Mean field game system

Summarizing Eqs. (4.22)(4.21a)(4.21b), when the HJB and continuity equations are coupled, we have the following MFG system with the cost function $f(u,\rho)$ :

[TABLE]

The associated initial and terminal conditions are provided by the initial density $\rho(x,0)=\rho_{0}(x)$ and the terminal cost $V(x,T)=V_{T}(x)$ , respectively. The choice of boundary conditions depends on the traffic scenario. When cars drive on a ring road without any entrance nor exit, periodic boundary conditions are specified as: $\rho(0,t)=\rho(L,t),V(0,t)=V(L,t)$ ; When the road has an entrance at $x=0$ and an exit at $x=L$ , we should impose the boundary conditions $\rho(0,t)=\rho_{\text{entr}}(t)$ representing the inflow at the entrance and $V(L,t)=V_{\text{exit}}(t)$ representing the boundary cost when cars leave the road at the exit. This paper will focus on the periodic boundary conditions.

Denote the system’s solution by $\rho^{*}(x,t)$ and $u^{*}(x,t)$ . The optimal velocity field $u^{*}(x,t)$ is our primary focus and will thus be referred as the mean field equilibrium (MFE) in the subsequent analysis.

Remark 1.

[MFG] is the general MFG system with any cost function $f(u,\rho)$ that is strictly convex with respect to $u$ . The existence and uniqueness of MFE for the general system remains to be investigated. MFGs with some special cost functions are shown to have a unique MFE, see discussions in Section 5.2.1.

Remark 2.

The MFG system derived here is usually called a non-viscous MFG system, because we assume no stochasticity on cars’ dynamics. Accordingly, [MFG] has no viscous terms such as $\rho_{xx}$ and $V_{xx}$ . For theory on non-viscous MFG, we refer to [108, 113].

5 Mean field games in traffic flow

MFG shares the same continuity equation with traditional traffic flow models but characterizes cars’ reactions to traffic congestions in a different way. Traditional traffic flow models prescribe a relationship between traffic density and the car’s speed or acceleration, while MFG models the car’s speed selection as an optimal control problem with a prescribed cost function. Based upon such understanding, MFG can be seen as a macroscopic traffic flow model that models AVs’ predictive and rational driving behavior. In this section we first establish connections between MFG and the traditional LWR model, and then present MFG examples by choosing appropriate cost functions to quantify AVs’ driving objectives.

5.1 Connections between MFG and LWR

The Lighthill-Whitham-Richards (LWR) model [20, 114] is a representative of traditional traffic flow models, so it would be helpful to establish connections between MFG and LWR. [102, 103] have revealed such connections focusing on a specific class of LWR with the Greenshields fundamental diagram. [102] presented a cost function whose corresponding MFG takes the Greenshields LWR as a solution. [103] claimed that the Greenshields LWR is essentially a MFG with a specific cost function when drivers minimize their driving costs by selecting their driving speeds myopically .

In the subsequent analysis we will establish connections between MFG and LWR from two perspectives, as shown in Figure 3. (i) We coin a cost function for an arbitrary fundamental diagram and show that the LWR is a solution of the corresponding MFG; (ii) LWR can also be seen as the myopic limit of MFG by letting the length of the planning horizon tend to zero, with a general family of cost functions.

5.1.1 LWR as a solution to MFG

Let us choose an arbitrary desired speed function $U(\rho)$ . The corresponding LWR model is:

[TABLE]

Now we directly set the driving objective to be maintaining the LWR speed. There are infinite choices of respective cost functions. Here we artificially choose the following cost function:

[TABLE]

Eq. (4.23b) can be interpreted as quantifying the difference between the car’s actual speed and desired speed. In other words, the car’s objective is to keep not too far from human driving. The other reason to choose the cost function $f_{\text{LWR}}$ is that it relates to another cost function $f_{\text{NonSep}}$ modeling driving efficiency and safety, which will be shown later.

The cost function $f_{\text{LWR}}$ corresponds to the following MFG system:

[TABLE]

Theorem 5.1.

The solution of [LWR] is a solution of [MFG-LWR] under the conditions that: (i) [MFG-LWR] and [LWR] have the same initial density $\rho_{0}(x)$ and periodic boundary conditions; (ii) $V_{T}(x)=C$ where $C$ is an arbitrary constant for [MFG-LWR].

Proof.

Denote $\rho^{*}(x,t)$ and $u^{*}(x,t)$ the solution of [LWR]. Note that Eq. (4.23aa) is the same as Eq. (4.23ca), it suffices to show that $\rho^{*}$ and $u^{*}$ satisfy the HJB equations (4.23cb)(4.23cc) for some $V^{*}$ . Take $V^{*}\equiv C$ , then the terminal condition $V^{*}(x,T)=V_{T}(x)=C$ is satisfied and Eqs. (4.23cb)(4.23cc) become a single equation $u^{*}=U(\rho^{*})$ , which is true from Eq. (4.23ab). So $\rho^{*}$ , $u^{*}$ and $V^{*}\equiv C$ is a solution of [MFG-LWR]. ∎

Remark 3.

The solution uniqueness of [MFG-LWR] is not studied in this paper. Proving the solution existence and uniqueness of MFG systems with general nonseparable cost functions is mathematically challenging, such existence and uniqueness results are only obtained with short time horizon or small initial density [115]. Here we show the solution existence of [MFG-LWR] from Theorem 5.1. When the solution uniqueness holds for [MFG-LWR], its unique solution is the solution of [LWR] and the two systems [MFG-LWR] and [LWR] are equivalent. The equivalence between [MFG-LWR] and [LWR] is also supported by the numerical experiment. The rigorous proof will be left for the future research.

Remark 4.

$V_{T}(x)=C$ means that cars have no preference on their final positions. One can specify the preference by imposing a non-constant terminal cost [100]. In this paper we will always assume the terminal cost $V_{T}(x)=C$ .

Theorem 5.1 will be verified with the Greenshields desired speed function:

[TABLE]

later in the numerical experiment, where $\rho_{\text{jam}}$ is the jam density.

5.1.2 LWR as the myopic limit of MFG

To demonstrate the other connection between MFG and LWR, we consider a general cost function $f(u,\rho)$ and its corresponding MFG system [MFG].

Given the planning horizon $[0,T]$ , a generic car selects its optimal velocity control to minimize the driving cost functional defined in Eq. (4.7). If the generic car is myopic and does not concern the future, intuitively it will select the speed $u$ to minimize the instantaneous cost, i.e., $u=\operatorname*{argmin}\nolimits_{0\leq\alpha\leq u_{\text{max}}}f(\alpha,\rho)$ at any time $t$ , which leads to a LWR model with the desired speed:

[TABLE]

according to Eq. (4.20).

To give a rigorous description of the myopic behavior, we define the myopic limit to be the limiting process when the length of the planning horizon $T\to 0$ . In the myopic limit, the anticipation effect of future traffic tends to zero. It is expected that the solution of the MFG will converge to the solution of the LWR with the desired speed function $U(\rho)$ defined in Eq. (4.23e).

Theorem 5.2.

*Under the conditions that: (i) $f(u,\rho)$ is continuously differentiable, strictly convex with respect to $u$ ; (ii) the terminal cost $V_{T}(x)=C$ where $C$ is an arbitrary constant for [MFG]; (iii) there exists $T_{0}>0$ such that whenever $0<T\leq T_{0}$ , with initial density $\rho_{0}(x)$ and periodic boundary conditions, [MFG] has a unique solution $\rho^{(T)}(x,t)$ , $u^{(T)}(x,t)$ and $V^{(T)}(x,t)$ which are uniformly bounded up to second order derivatives on $0\leq x\leq L$ , $0\leq t\leq T\leq T_{0}$ . When $T\to 0$ we have: *

[TABLE]

Proof.

There exists a constant $M>0$ such that $|V^{(T)}_{xt}(x,t)|\leq M$ for all $0\leq x\leq L$ and $0\leq t\leq T\leq T_{0}$ . Integrate the inequality from $t=T$ to $t=0$ , note that $V_{x}^{(T)}(x,T)=\frac{dV_{T}(x)}{dx}=0$ for all $0\leq x\leq L$ , we get:

[TABLE]

for all $0\leq x\leq L$ and $0\leq T\leq T_{0}$ . Hence $V^{(T)}_{x}(x,0)\to 0$ when $T\to 0$ , $\forall x\in[0,L]$ .

Since $f(u,\rho)$ is continuously differentiable and strictly convex with respect to $u$ , $f_{p}^{*}(p,\rho)$ is continuous with respect to $p$ . From Eq. (4.23c) we deduce that when $T\to 0$ :

[TABLE]

∎

Remark 5.

Typically a desired speed function $U(\rho)$ is supposed to satisfy certain conditions. For example: (i) $U^{\prime}(\rho)\leq 0$ ; (ii) $U(0)=u_{\text{max}}$ ; (iii) $U(\rho_{\text{jam}})=0$ . In Theorem 5.2, $U(\rho)$ is computed from Eq. (4.23e). The conditions on $U(\rho)$ are rewritten as: (i) $f^{*}_{p\rho}(0,\rho)\leq 0$ ; (ii) $f^{*}_{p}(0,0)=u_{\text{max}}$ ; (iii) $f^{*}_{p}(0,\rho_{\text{jam}})=0$ . Here the subscripts represent respective partial derivatives. Using the identity $f_{u}(f^{*}_{p}(p,\rho),\rho)=p$ between $f$ and $f^{*}$ [116], we can translate the conditions on $f^{*}$ to those on $f$ . As a result, we require the cost function $f(u,\rho)$ to satisfy: (i) $f_{u\rho}(U(\rho),\rho)\geq 0$ ; (ii) $f_{u}(u_{\text{max}},0)=0$ ; (iii) $f_{u}(0,\rho_{\text{jam}})=0$ . These conditions provide a way to calibrate the cost function from its myopic behavior.

We can now interpret LWR from the perspective of MFG, which provides a richer behavioral foundation and a more general and flexible framework.

5.2 MFG examples

As a micro-macro game-theoretic model, MFG can capture richer driving behaviors than LWR by choosing various cost functions. Different cost functions relate to different driving objectives and consequently lead to different MFGs. In this subsection, we will present two concrete cost functions quantifying AVs’ kinetic energy, driving efficiency and safety.

5.2.1 MFG-Separable

We will propose a special cost function whose corresponding MFG has nice mathematical properties. This family of cost functions is called separable [115], i.e., $f(u,\rho)$ can be written as the sum of two univariate functions with respect to $u$ and $\rho$ . Denote $\rho_{\text{jam}}$ the jam density, we propose a cost function that is separable and models AVs’ kinetic energy, driving efficiency and safety:

[TABLE]

In Eq. (4.23i), the first term of $f_{\text{Sep}}(u,\rho)$ represents the kinetic energy; the second term quantifies driving efficiency by speed magnitudes; the last term quantifies driving safety using a traffic congestion penalty term on density $\rho$ , meaning that AVs tend to avoid staying in high density areas. We denote the corresponding MFG by [MFG-Separable].

Since the cost function $f_{\text{Sep}}(u,\rho)$ is separable, [MFG-Separable] is a potential game when there are no speed constraints [106]. [113] proved the existence and uniqueness results for a family of potential MFGs including the one presented here.

When there are speed constraints $0\leq u\leq u_{\text{max}}$ , the minimum of $f(u,\rho)+uV_{x}$ is attained at:

[TABLE]

So the MFG system is:

[TABLE]

5.2.2 MFG-NonSeparable

We propose another cost function that quantifies driving safety in a more explicit way. The cost function is:

[TABLE]

It quantifies kinetic energy and driving efficiency in the same way as in $f_{\text{Sep}}$ but uses a different traffic congestion penalty term on the product of density and speed to quantify driving safety. The new penalty term means that AVs tend to decelerate in high density areas and accelerate in low density areas. We denote the corresponding MFG by [MFG-NonSeparable].

Let us rewrite the cost function in a different way:

[TABLE]

where $U(\rho)$ is the Greenshields desired speed function defined in Eq. (4.23d). It provides a different way to interpret the cost function $f_{\text{NonSep}}$ that AVs tend to be not too far from human driving, and they like to stay in low density areas.

Comparing Eq. (4.23m) with Eq. (4.23b), we see that $f_{\text{NonSep}}$ can be seen as a variant of $f_{\text{LWR}}$ . That is one reason why we pick the cost function $f_{\text{LWR}}$ in Section 5.1.1.

With speed constraints $0\leq u\leq u_{\text{max}}$ , the minimum of $f(u,\rho)+uV_{x}$ is attained at:

[TABLE]

The corresponding MFG system is:

[TABLE]

Remark 6.

Letting $V_{x}\to 0$ , Eq. (4.23n) becomes $u=u_{\text{max}}(1-\rho/\rho_{\text{jam}})$ , which is the same as the Greenshields desired speed defined in Eq. (4.23d). From Theorem 5.2 we know that the Greenshields LWR is the myopic limit of [MFG-NonSeparable].

[MFG-NonSeparable] can also be interpreted from the perspective of traditional higher-order traffic flow models. To see this, let $u_{\text{max}}=\rho_{\text{jam}}=1$ , remove the speed constraints $0\leq u\leq u_{\text{max}}$ and take Eq. (4.23oc) into Eq. (4.23ob). Then we obtain the following system:

[TABLE]

Differentiating Eq. (4.23pb) with respect to $x$ and Eq. (4.23pc) with respect to $t$ yields:

[TABLE]

Using the identity $V_{xt}=V_{tx}$ , we can eliminate the variable $V$ from the HJB equations and obtain:

[TABLE]

Eq. (4.23s) coupled with the continuity equation (4.23pa) forms the reduced MFG system:

[TABLE]

The reduced MFG system has the same initial condition $\rho(x,0)=\rho_{0}(x)$ as that of the original system. Moreover, the original system’s terminal condition $V_{T}(x)=C$ and Eq. (4.23pc) yield the terminal condition $\rho(x,T)+u(x,T)=1$ of [reduced MFG].

The reduced MFG system [reduced MFG] has a similar structure to traditional higher-order traffic flow models. Eq. (4.23ta) is the continuity equation and Eq. (4.23tb) has an interpretation that the car’s acceleration $u_{t}+uu_{x}$ is exactly the negative temporal derivative of the density.

The proposed MFG systems are more effectively simulated than discrete differential games. We will discretize the systems in space and time and then present a solution algorithm to compute the MFE.

6 MFE solution algorithm

Because of the forward-backward structure, the MFG system can be solved in neither forward nor backward direction. Given the density profile $\rho(x,t)$ , the HJB equations (LABEL:eq:mfg_hjb1)(4.23c) can be solved backward from $t=T$ to $t=0$ with terminal cost $V_{T}(x)$ for $u(x,t)$ and $V(x,t)$ ; given the velocity field $u(x,t)$ , the continuity equation (4.23a) can be solved forward from $t=0$ to $t=T$ with initial density $\rho_{0}(x)$ for $\rho(x,t)$ . However, the two directions can not be matched simultaneously. So it is challenging to compute the MFE numerically.

Based on the existing studies, there have been three types of numerical methods considered for MFG: fixed-point iteration, variational method and Newton’s method.

The fixed-point iteration solves the forward and backward equations alternatingly. It is easy to implement once appropriate forward and backward solvers are picked [98, 103]. However, the iterations converge only when $T$ is small, that is, for a short planning horizon. Moreover, there is no theory to estimate how small $T$ should be to guarantee the convergence.

The variational method deals with separable cost functions and potential MFGs. In this case, it is shown that the MFG system is equivalent to an optimization problem constrained by the continuity equation [106]. Then a variety of optimization tools can be applied [100, 117, 118]. The variational method works for any planning horizon but relies on the separability of the cost function. [100] used the variational method to solve MFGs in pedestrian crowds modeling.

A more general approach is based on the Newton’s method. Such an approach is first proposed by [119, 120, 121] to solve a family of MFGs. The key idea is to take both forward and backward equations as a single nonlinear system and solve the nonlinear system by the Newton’s method. This method is suitable for our purpose because it has no requirements on the length of the planning horizon nor the separability of the cost function. However, the Newton’s method may fail to converge if one does not have a good initial guess to the solution. So tricks to improve the convergence are needed when applying the Newton’s method.

This paper develops a multigrid preconditioned Newton’s finite difference algorithm for MFG. It works well with different cost functions and planning horizons. Numerical examples of MFGs proposed in Section 5 are shown using this algorithm.

6.1 Algorithm

Let us divide the road $[0,L]$ into cells $\{[x_{j-1},x_{j}]\}_{j=1}^{N_{x}}$ and the planning horizon $[0,T]$ into time steps $\{t^{n}\}_{n=0}^{N_{t}}$ with spatial and temporal step sizes $\Delta x=L/N_{x}$ and $\Delta t=T/N_{t}$ . To impose the periodic boundary conditions, $x_{0}$ and $x_{N_{x}}$ are assumed to be the same location. Denote $\rho_{j}^{n}$ the average density and $u_{j}^{n}$ the average velocity on the $j^{\text{th}}$ cell $[x_{j-1},x_{j}]$ at time $t^{n}$ ( $j=1,\dots,N_{x}$ ; $n=0,\dots,N_{t}$ ). Denote $V_{j}^{n}=V(x_{j},t^{n})$ .

We first discretize the continuity equation (4.23a) by a finite volume conservative Lax-Friedrichs scheme [122]:

[TABLE]

We then discretize the HJB equations (LABEL:eq:mfg_hjb1)(4.23c) by an upwind scheme:

[TABLE]

Remark 7.

To ensure the stability of scheme (4.23a), the CFL condition $\alpha\Delta t\leq\Delta x$ should be posed [122] where $\alpha=\max\nolimits_{j,n}|u_{j}^{n}|$ . When the MFG has speed constraints $0\leq u\leq u_{\text{max}}$ , it suffices to ensure $u_{\text{max}}\Delta t\leq\Delta x$ .

The initial and terminal conditions are discretized by:

[TABLE]

Eqs. (4.23a)(4.23b)(4.23c)(4.23d) form a closed system for unknowns $\{\rho_{j}^{n}\}_{1\leq j\leq N_{x}}^{0\leq n\leq N_{t}}$ , $\{u_{j}^{n}\}_{1\leq j\leq N_{x}}^{0\leq n\leq N_{t}-1}$ and $\{V_{j}^{n}\}_{1\leq j\leq N_{x}}^{0\leq n\leq N_{t}}$ . The system can be written as:

[TABLE]

where $w\in\mathbb{R}^{3N_{x}N_{t}+2N_{x}}$ is a long vector containing all $\rho_{j}^{n}$ , $u_{j}^{n}$ and $V_{j}^{n}$ , and $F:\mathbb{R}^{3N_{x}N_{t}+2N_{x}}\to\mathbb{R}^{3N_{x}N_{t}+2N_{x}}$ encodes all equations.

Eq. (4.23e) may lead to a large nonlinear system. We denote $J$ the Jacobian matrix of $F$ and apply the Newton’s method to solve Eq. (4.23e):

[TABLE]

with any initial guess $w^{0}$ .

To improve the convergence of Newton’s iterations, we apply multigrid to get a good initial guess and preconditioning to accelerate the linear solver. Multigrid and preconditioning are widely used tricks in numerical algorithms, see [123, 124].

•

Multigrid: Start with a coarse grid $N_{x}^{(0)}\times N_{t}^{(0)}$ so that the MFG system is easy to solve. Then iteratively refine the grids and solve the MFG system on finer grids $N_{x}^{(k)}\times N_{t}^{(k)}$ , $k=1,2,\dots$ until getting a solution of desired resolution. At step $k$ , interpolate the solution $w^{(k-1)}$ from the grid $N_{x}^{(k-1)}\times N_{t}^{(k-1)}$ onto the finer grid $N_{x}^{(k)}\times N_{t}^{(k)}$ , which provides a good initial guess when solving on the finer grid by the Newton’s method.

•

Preconditioning: At each Newton’s iteration a linear system

[TABLE]

need to be solved. We use the GMRES iterative linear solver [125] since $J(w^{n})$ is sparse . However, the ill-posedness of the linear system leads to bad convergence. To solve the issue, we pick an approximate matrix $\tilde{J}(w^{n})$ to $J(w^{n})$ by ignoring the coupling parts between forward and backward equations. Inverting $\tilde{J}(w^{n})$ is equivalent to solving a decoupled forward-backward system. We use $\tilde{J}(w^{n})$ as a preconditioner to accelerate the GMRES convergence.

Using the algorithm, we shall compute MFE solutions and show simulations for MFGs proposed in Section 5.

6.2 Numerical examples

6.2.1 Settings

Set the road length $L=1$ and the planning horizon length $T=3$ . Set the free flow speed $u_{\text{max}}=1$ and the jam density $\rho_{\text{jam}}=1$ . Choose the following initial density:

[TABLE]

where $0\leq\rho_{a}\leq\rho_{b}\leq 1$ and $\gamma>0$ are constant parameters. The initial density represents the scenario that initially cars cluster near $x=L/2$ and the traffic is lighter in other places. We choose the terminal cost $V_{T}(x)=0$ and specify periodic boundary conditions $\rho(0,t)=\rho(L,t)$ , $V(0,t)=V(L,t)$ .

On a spatial-temporal grid of size $N_{x}=120$ and $N_{t}=480$ , we compute MFE solutions $\rho^{*}(x,t)$ , $u^{*}(x,t)$ and $V^{*}(x,t)$ for the three MFG systems in Section 5.

For [MFG-LWR], we choose $U(\rho)$ to be the Greenshields desired speed function defined in Eq. (4.23d).

6.2.2 Density Evolution

Fix the same initial density $\rho_{0}(x)$ defined in Eq. (4.23h) with $\rho_{a}=0.05$ , $\rho_{b}=0.95$ and $\gamma=0.1$ , we compute the MFE solutions for the three MFG systems and plot their density evolutions in a 3D space-time-density diagram. See Figure 4a and Figure 5.

Figure 4a shows the formation and propagation of a shock wave for [MFG-LWR]. The shock wave moves with smaller and smaller amplitudes but does not disappear in the given time horizon $[0,T]$ .

Figure 5 shows that for both [MFG-NonSeparable] and [MFG-Separable], the initial high density quickly dissipates. For [MFG-NonSeparable], the density profile keeps smooth and no shock wave forms. From time $t=1$ , the density becomes a uniform flow. For [MFG-Separable], the behavior is similar but the high density dissipates more slowly and the density becomes nearly a uniform flow from $t=2$ . Such phenomena are different from traditional traffic flow models.

The results show that AVs’ anticipation behavior helps to avoid the formation of shock waves and to stabilize the traffic in this set-up. Figure 7 reveals the rationale by plotting the snapshots of the MFE solutions’ density, speed and optimal cost profiles with respect to spatial coordinate $x$ at time instants $t=0$ and $t=1.5$ for [MFG-NonSeparable] and [MFG-Separable]. In addition, we compute the LWR speeds from the density profiles with the Greenshields desired speed function defined in Eq. (4.23d), plot the LWR speeds and compare them with the MFE speeds in the same axes.

We observe from Figure 7 that all of the density, speed and optimal cost profiles converge to constant profiles as time goes on. Figure 7a and Figure 7b show asymmetric optimal cost around a jam area with symmetric traffic density. As a result of the “pressure” from the optimal cost, AVs tend to slow down farther upstream before joining the jam and immediately speed up after leaving the jam, in contrast to HVs whose speeds remain symmetric before and after the jam area. In other words, a HV’s speed is determined only through traffic density at the current time while that of an AV depends on model predicted traffic density over the entire horizon. Such behavioral difference between AVs and HVs result in different traffic flows.

6.2.3 Fundamental Diagram

Fundamental diagram is a basic tool to understand traditional traffic flow models [9]. In this subsection we will collect density and flow data from the MFE solution and plot the data in a fundamental diagram for both [MFG-LWR] and [MFG-NonSeparable].

The density and flow data are collected from the MFE solution as follows: Take $n_{x}=24$ equidistantly distributed locations $x_{1},x_{2},\dots,x_{n_{x}}$ on $[0,L]$ and $n_{t}=96$ time snapshots $t^{k}=\frac{kT}{n_{t}}$ , $k=0,1,\dots,n_{t}$ . We first pick the density and speed values $\rho^{*}(x_{i},t^{k})$ and $u^{*}(x_{i},t^{k})$ that represent the average density and speed near the spatial-temporal coordinate $(x_{i},t^{k})$ for $i=1,\dots,n_{x}$ , $k=0,\dots,n_{t}$ , then compute the flow $q^{*}(x_{i},t^{k})=\rho^{*}(x_{i},t^{k})u^{*}(x_{i},t^{k})$ from the density and speed. The collected data $\{\rho^{*}(x_{i},t^{k}),q^{*}(x_{i},t^{k})\}_{1\leq i\leq n_{x},0\leq k\leq n_{t}}$ are plotted on the density-flow diagram. Such a way to plot the fundamental diagram from a macroscopic traffic flow model is also used in [126].

For [MFG-LWR], we collect the data from the MFE solution shown in Figure 4a and plot the fundamental diagram in Figure 4b. We see that all of the collected density-flow data points fall onto the Greenshields equilibrium curve $q=u_{\text{max}}\rho(1-\rho/\rho_{\text{jam}})$ . The results verify Theorem 5.1.

For [MFG-NonSeparable], we plot the fundamental diagram by collecting the data from a set of different MFE solutions. We take different initial densities by varying the values of $\rho_{a}$ and $\rho_{b}$ from $0.05$ to $0.95$ but keep $\rho_{a}<\rho_{b}$ and $\gamma=0.1$ . For each initial density $\rho_{0}(x)$ we compute the MFE solution from [MFG-NonSeparable] and collect the data in the way mentioned. Then we plot all collected data in the same fundamental diagram, see Figure 6.

We observe from Figure 6 that: (i) All data points lie below the line $q=u_{\text{max}}\rho$ , this is due to the speed constraint $u\leq u_{\text{max}}$ in [MFG-NonSeparable]. (ii) All data points cluster around the Greenshields equilibrium curve $q=u_{\text{max}}\rho(1-\rho/\rho_{\text{jam}})$ , this is because [MFG-NonSeparable] is related to the Greenshields LWR. The traffic flow always converges to a uniform flow represented by a data point on the Greenshields equilibrium curve. The position of the data point on the curve depends on the initial density. (iii) The fundamental diagram can be split into a free flow regime where $\rho\leq 0.5$ and a congested regime where $\rho>0.5$ . In the free flow regime, cars can achieve the free flow speed $u_{\text{max}}$ . It reflects the efficiency term in the cost function $f_{\text{NonSep}}$ ; in the congested regime cars cannot achieve the free flow speed, which reflects the safety term in the cost function. (iv) Different from the fundamental diagram of human driving [127], data points in the free flow regime of Figure 6 may not lie on the line $q=u_{\text{max}}\rho$ , this results from AVs’ anticipation behavior. Even in a low density area, the car may drive at a lower speed than the desired one if there is a traffic jam ahead.

6.2.4 Algorithm Convergence

To provide more evidence on the convergence of the solution algorithm, we compute and plot the solution errors on different grids for above examples. See Figure 8. Since we do not know any explicit solutions to those MFG systems, the errors are estimated using numerical solutions on different grids. To check the solution $\rho^{*}(x,t),u^{*}(x,t)$ on the $N_{x}\times N_{t}$ grid, we first solve the MFG on the coarse grid of size $N_{x}/2\times N_{t}/2$ and then interpolate the coarse solution back onto the $N_{x}\times N_{t}$ grid. Denote the interpolated solution by $(\tilde{\rho}^{*},\tilde{u}^{*})$ , the solution error on the $N_{x}\times N_{t}$ grid is estimated as:

[TABLE]

where the norm is chosen as the $L^{1}$ norm on $[0,L]\times[0,T]$ .

We fix the spatial-temporal ratio $N_{t}/N_{x}=4$ and increase $N_{x}$ from 30 to 120. Then we plot the errors computed by Eq. (4.23i) with the spatial grid size $N_{x}$ . From Figure 8 we see first order convergence for all of the three numerical examples.

7 From MFG back to N-Car differential game

Summarizing the previous sections, we have derived a continuous mean field game [MFG] from a discrete differential game (Definition 4.1) and developed a solution algorithm for the mean field game. In this section we shall build the connection between the discrete differential game equilibrium (DGE) and the continuous mean field equilibrium (MFE) in the sense of $\epsilon$ -Nash equilibrium. First we provide a way to construct a tuple of discrete controls from a MFE solution. Then we introduce the concept of $\epsilon$ -Nash equilibrium and show how to characterize the accuracy of the MFE-constructed controls. It is validated by numerical examples that the MFE-constructed controls are a good approximate equilibrium of the original $N$ -car differential game (DG) when $N$ is large.

7.1 MFE-constructed controls and accuracy characterization

From a continuous MFE solution, we construct a tuple of discrete controls $\hat{v}_{1}(t),\dots,\hat{v}_{N}(t)$ for the DG by applying the feedback law (4.11)(4.12). The rationale underlying such construction is quite straightforward: for $i=1,\cdots,N$ , the $i^{\text{th}}$ car’s instantaneous speed selection at time $t$ is determined by MFE’s velocity field $u^{\ast}(x_{i}(t),t)$ at that time and the $i^{\text{th}}$ car’s location $x_{i}(t)$ . Mathematically, for $i=1,\cdots,N$ and $t\in\left[0,T\right]$ :

[TABLE]

Integrating the above dynamical system gives the $i^{\text{th}}$ car’s velocity control $\hat{v}_{i}(t)$ and trajectory $x_{i}(t)$ over the planning horizon $[0,T]$ , $i=1,\dots,N$ .

As an example, Figure 9 shows the trajectories integrated from the MFE solution shown in Figure 5a of [MFG-NonSeparable] with $N=21$ cars. Each of the 21 lines represents the trajectory of one car. We take the time to be the $x$ -axis and the cumulative distance to be the $y$ -axis to avoid special treatments on periodic boundary conditions. We see that even though cars cluster near $x=0.5$ at the staring time, they become uniformly distributed at the final time. It means that the flow converges to the uniform flow.

Remark 8.

We observe from Figure 9 that there is no intersection between any pair from the cars’ trajectories. In other words, the first-in-first-out (FIFO) property is satisfied. Actually, Eqs. (4.23a)(4.23b) is a first order ODE system for all $i=1,\dots,N$ . The solution uniqueness then guarantees that there is no intersection between any pair from the $N$ trajectories starting from different initial locations [128]. In other words, the FIFO property is always guaranteed by MFE-constructed controls.

Now we would like to know whether the MFE-constructed controls $\hat{v}_{1}(t),\dots,\hat{v}_{N}(t)$ are a good approximate equilibrium of the DG. Since the DG’s true equilibrium $v^{\ast}_{1}(t),\dots,v^{\ast}_{N}(t)$ may not exist nor be unique and typically it is hard to get, we will characterize the accuracy of the constructed controls in terms of the driving cost functional. Along this line, such an approximate equilibrium can be treated as an $\epsilon$ -Nash equilibrium of the DG, which is formally defined below [108, 113].

Definition 7.1.

A tuple of controls $\tilde{v}_{1},\dots,\tilde{v}_{N}$ is an $\epsilon$ -Nash equilibrium of the DG, if

[TABLE]

At an $\epsilon$ -Nash equilibrium, no car can improve its driving cost better than $\epsilon$ by unilaterally switching its velocity control.

For a potential game where the cost function $f(u,\rho)$ is separable, e.g., [MFG-Separable], [113] proved the correspondence between the MFE-constructed controls and an $\epsilon$ -Nash equilibrium of DG, that is: for any $\epsilon>0$ , there exist $N,\sigma>0$ such that $\hat{v}_{1}(t),\dots,\hat{v}_{N}(t)$ constructed from the MFE solution is an $\epsilon$ -Nash equilibrium of the DG.

Unfortunately, there do not exist any theoretical results for a general cost function such as $f_{\text{NonSep}}$ . In this paper, instead of offering a formal proof, we will validate such correspondence using numerical examples. A rigorous proof of such correspondence for a general cost function will be left for future research.

Tailoring to our context, in the subsequent numerical examples, we aim to illustrate that the MFE-constructed controls are an $\epsilon$ -Nash equilibrium of DG by characterizing the accuracy $\epsilon$ across all feasible controls and all cars. There always exists an arbitrarily large $\epsilon$ that can make $\hat{v}_{1}(t),\dots,\hat{v}_{N}(t)$ satisfy the condition (4.23c). What we are more interested in is a lower bound, denoted as $\hat{\epsilon}\geq 0$ , such that:

[TABLE]

In other words,

[TABLE]

Let us move the second maximum symbol in front of the second term that depends only on $v_{i}$ , then we have:

[TABLE]

where $\bar{v}_{i}$ is the best response that solves $\min_{v_{i}\in\mathcal{A}}J_{i}^{N}(v_{i},\hat{\bm{v}}_{-i})$ . We attain $\bar{v}_{i}$ from the following optimal control problem, while keeping other cars’ strategies $\hat{\bm{v}}_{-i}$ unchanged:

[TABLE]

Definition 7.2.

The accuracy of a tuple of controls $\hat{v}_{1}(t),\dots,\hat{v}_{N}(t)$ is the $\hat{\epsilon}$ defined in Eq. (4.23e).

7.2 Accuracy validation with numerical examples

Summarizing Section 4, Section 6 and Section 7.1, we shall reiterate the procedure of solving an approximate equilibrium of the DG from its respective MFG in a more systematic way. The procedure of solving MFE-constructed controls and validating the accuracy of those controls is listed in Algorithm 1. We will test some numerical examples following the procedure.

In the numerical examples, we aim to convey two main messages:

Given $N,\sigma>0$ , we aim to construct an $\epsilon$ -Nash equilibrium of the DG from a MFE solution and compute its accuracy $\hat{\epsilon}$ . 2. 2.

The accuracy $\hat{\epsilon}$ deceases as $N$ becomes large.

The general set-up of the following numerical examples is similar to that used previously. We fix the length of the planning horizon $T=1$ . Each time we solve a different DG by varying the number of cars $N=21,41,61,81,101$ . For different numbers of cars, the cars’ initial positions are sampled from the same initial distribution defined in Eq. (4.23h) with $\rho_{a}=0.2$ , $\rho_{b}=0.8$ and $\gamma=0.15L$ . For each $N$ we take the road length $L=N$ and choose the smoothing parameter $\sigma=0.05L$ . The cost functions for [MFG-NonSeparable] and [MFG-Separable] are tested.

To see the first message, Figure 10 compares the difference between costs computed from MFE-constructed controls and those from the respective best response strategies when $N=21$ . In Figure 10, the $x$ -axis is the car’s index $i$ while the $y$ -axis is the cost $J_{i}^{N}$ for $i=1,2,\dots,N$ . We see from the figure that for both cost functions the MFE solutions generate good approximate equilibria of DGs.

To see the second message, we compute the maximal relative accuracy and mean relative accuracy of MFE-constructed controls for different numbers of cars, which is shown in Figure 11. We see from Figure 11 that for both cost functions better accuracy is obtained as $N$ increases.

7.3 Discussion

The aforementioned procedure aims to characterize the interplay between microscopic DG and macroscopic MFG. In particular, we can use the MFE solution to construct an $\epsilon$ -Nash equilibrium of DG and show that this $\epsilon$ -Nash equilibrium has desired accuracy. It provides an efficient and scalable method to solve AVs’ individual controls in a game-theoretic framework.

Note that it is challenging to find the equilibrium of DG accurately. In fact, it is not known whether or not DGE does exist and is unique. In this paper, though, we have not discussed the solution properties of the original differential game. There are three cases in terms of its solutions:

DGE does not exist: Albeit non-existent, in practice, we still need to find a “good enough” control for each individual AV so that every AV achieves its predefined driving objective with a reasonable performance. The MFE-constructed controls can be used as an approximate equilibrium of DG. 2. 2.

DGE exists and is unique: The MFE-constructed controls provide a good initial guess for solving the accurate DGE, and the proposed method can help to characterize the upper bound of the deviation. 3. 3.

DGE exists but is non-unique: The MFE-constructed controls can be an approximation to one DGE. However, the characterization of the error bound and the proposed method of finding an $\epsilon$ -Nash equilibrium may become debatable. Such a case will be left for future research.

8 Conclusions and future research

This paper applies the MFG to solve the continuous velocity control problem for a system of AVs in traffic. The MFG offers advantages over the existing individual control methods due to its scalability properties. The proposed game-theoretic framework links micro- and macro-scale behaviors, offering insights into systematic impacts of strategic interactions among AVs from a microscopic scale. To the best of our knowledge, this is the first study to characterize equilibrium solutions in both continuous MFGs and discrete differential games in traffic. Unlike most of the existing studies that approximate discrete AV controls directly, we develop a game-theoretic framework from micro- to macro-scale, and then construct solutions from macro- back to micro-scale. In particular, we first introduce the macroscopic mean field game, solve its equilibrium, construct discrete controls from the mean field equilibrium, and then validate the consistency between the constructed discrete controls and the equilibrium of the original differential game. Our findings will help transportation engineers and planners to better predict and forecast traffic conditions when AVs reach a critical mass, which in turn will prepare them for a smooth transition from the present to future AV-equipped transportation systems.

This work can be generalized in several ways: (i) As a first step to model AVs’ strategic interactions under the mean field game framework, this paper only presents AVs’ longitudinal velocity controls with many simplifying assumptions. The control variables, assumptions on cost functions and constraints need to be generalized. For example, we will incorporate the acceleration rate into AVs’ control variables, relax the homogeneity assumptions by considering multi-class MFGs, relax assumption (A3) by incorporating the derivatives of the density into AVs’ driving cost functionals and add constraints on the density and acceleration rate for the MFG. (ii) This paper mainly focuses on deterministic, non-viscous MFGs. In the future we will discuss viscous MFGs by adding randomness in AVs’ dynamics. (iii) We rigorously show the relations between MFGs and the LWR model. However, we are only able to discuss the relationship between MFGs and traditional higher-order traffic flow models by rewriting a specific MFG system to its reduced form and find the similarity. Exploring deeper connections between MFGs and traditional higher-order models will be left for future research. (iv) This paper presents three concrete cost functions and their respective MFGs to illustrate AVs’ traffic flow patterns and the consistency between discrete and continuous equilibria. In the future we will explore other families of cost functions and provide more MFG examples.

Acknowledgments

The authors would like to thank Data Science Institute from Columbia University for providing a seed grant for this research.

Bibliography128

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. F. Newell, Nonlinear effects in the dynamics of car following, Operations research 9 (2) (1961) 209–229.
2[2] P. G. Gipps, A behavioural car-following model for computer simulation, Transportation Research Part B: Methodological 15 (2) (1981) 105–111.
3[3] M. Bando, K. Hasebe, A. Nakayama, A. Shibata, Y. Sugiyama, Dynamical model of traffic congestion and numerical simulation, Physical review E 51 (2) (1995) 1035.
4[4] M. Brackstone, M. Mc Donald, Car-following: a historical review, Transportation Research Part F: Traffic Psychology and Behaviour 2 (4) (1999) 181–196.
5[5] H. M. Zhang, A mathematical theory of traffic hysteresis, Transportation Research Part B: Methodological 33 (1) (1999) 1–23.
6[6] H. M. Zhang, T. Kim, A car-following theory for multiphase vehicular traffic flow, Transportation Research Part B: Methodological 39 (5) (2005) 385–399.
7[7] Z. Zheng, S. Ahn, D. Chen, J. Laval, Applications of wavelet transform for analysis of freeway traffic: Bottlenecks, transient traffic, and traffic oscillations, Transportation Research Part B: Methodological 45 (2) (2011) 372–384.
8[8] D. Chen, J. Laval, Z. Zheng, S. Ahn, A behavioral car-following model that captures traffic oscillations, Transportation research part B: methodological 46 (6) (2012) 744–761.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Please cite this paper as: Huang, Kuang, et al. “A game-theoretic framework for autonomous vehicles velocity control: Bridging microscopic differential games and macroscopic mean field games.” Discrete & Continuous Dynamical Systems-B 22.11 (2017): 0.

Abstract

keywords:

1 Introduction

1.1 Problem statement

1.1.1 Microscopic Longitudinal Controller of AVs

Definition 1.1**.**

1.1.2 Macroscopic Traffic Flow Theory for AVs

1.2 Literature review

1.3 Contributions of this paper

2 The modeling framework

3 From optimal control to differential game

3.1 Optimal longitudinal control of one car

3.2 N-Car differential game

4 From differential game to mean field game

4.1 Mean field limit

4.1.1 Step 1: Traffic Density Smoothing

Definition 4.1**.**

4.1.2 Step 2: Optimal Control of a Generic Car

4.1.3 Step 3: HJB Equation

4.1.4 Step 4: Continuity Equation

4.2 Mean field game system

Remark 1**.**

Remark 2**.**

5 Mean field games in traffic flow

5.1 Connections between MFG and LWR

5.1.1 LWR as a solution to MFG

Theorem 5.1**.**

Proof.

Remark 3**.**

Remark 4**.**

5.1.2 LWR as the myopic limit of MFG

Theorem 5.2**.**

Proof.

Remark 5**.**

5.2 MFG examples

5.2.1 MFG-Separable

5.2.2 MFG-NonSeparable

Remark 6**.**

6 MFE solution algorithm

6.1 Algorithm

Remark 7**.**

6.2 Numerical examples

6.2.1 Settings

6.2.2 Density Evolution

6.2.3 Fundamental Diagram

6.2.4 Algorithm Convergence

7 From MFG back to N-Car differential game

7.1 MFE-constructed controls and accuracy characterization

Remark 8**.**

Definition 7.1**.**

Definition 7.2**.**

7.2 Accuracy validation with numerical examples

7.3 Discussion

8 Conclusions and future research

Acknowledgments

Definition 1.1.

Definition 4.1.

Remark 1.

Remark 2.

Theorem 5.1.

Remark 3.

Remark 4.

Theorem 5.2.

Remark 5.

Remark 6.

Remark 7.

Remark 8.

Definition 7.1.

Definition 7.2.