Learning Model Predictive Control for Connected Autonomous Vehicles

Hassan Jafarzadeh; Cody Fleming

arXiv:1908.02879·math.OC·August 9, 2019·CDC

Learning Model Predictive Control for Connected Autonomous Vehicles

Hassan Jafarzadeh, Cody Fleming

PDF

TL;DR

This paper introduces a Learning Model Predictive Controller tailored for connected autonomous vehicles, capable of handling dynamic environments and data-driven decision variables, with proven effectiveness through simulations.

Contribution

It extends nonlinear LMPC architecture to accommodate data-driven variables and dynamic environments in CAV platooning applications.

Findings

01

Effective control logic demonstrated in simulations

02

Converges to optimal strategies over model and data-driven variables

03

Handles dynamic environments in connected autonomous vehicle scenarios

Abstract

A Learning Model Predictive Controller (LMPC) is presented and tailored to platooning and Connected Autonomous Vehicles (CAVs) applications. The proposed controller builds on previous work on nonlinear LMPC, adapting its architecture and extending its capability to (a) handle dynamic environments and (b) account for data-driven decision variables that derive from an unknown or unknowable function. The paper presents the control design approach, and shows how to recursively construct an outer loop candidate trajectory and an inner iterative LMPC controller that converges to an optimal strategy over both model-driven and data-driven variables. Simulation results show the effectiveness of the proposed control logic.

Tables1

Table 1. TABLE I: Scenario and Model Parameters

$\underline{u}$ , $\bar{u}$	control input limits	m/s²	[-6,6]
$N$	time horizon of outer loop MP	–	70
$ν$	time horizon of SR-LMPC	–	60
$d t$	sample time or discretization	s	0.2
$v_{ℓ}^{s s}$	lead vehicle speed (constant)	m/s	30
$v_{f, 0}$	follower vehicle initial speed	m/s	35
$Ω_{d}$	dropout zone of bridge	m	435-480

Equations98

x_{t + 1} = f (x_{t}, u_{t}),

x_{t + 1} = f (x_{t}, u_{t}),

x_{t} \in X, u_{t} \in U \forall t \geq 0.

x_{t} \in X, u_{t} \in U \forall t \geq 0.

J_{0 \to \infty}^{*}

J_{0 \to \infty}^{*}

x_{t + 1} = f (x_{t}, u_{t}) \forall k \geq 0

x_{0} = x_{S}

x_{k} \in X, u_{k} \in U \forall k \geq 0

h (x_{F}, 0) = 0 and h (x_{t}^{j}, u_{t}^{j}) ≻ 0 \forall x_{t}^{j} \in R^{n} ∖ {x_{F}}, u_{t}^{j} \in R^{m} ∖ {0}

h (x_{F}, 0) = 0 and h (x_{t}^{j}, u_{t}^{j}) ≻ 0 \forall x_{t}^{j} \in R^{n} ∖ {x_{F}}, u_{t}^{j} \in R^{m} ∖ {0}

f (x_{F}, 0) = x_{F} .

f (x_{F}, 0) = x_{F} .

u^{j}

u^{j}

x^{j}

x_{0}^{j} = x_{S}, \forall j \geq 0.

x_{0}^{j} = x_{S}, \forall j \geq 0.

SS^{j} = {i \in M^{j} ⋃ t = 0 ⋃ \infty x_{t}^{i}}

SS^{j} = {i \in M^{j} ⋃ t = 0 ⋃ \infty x_{t}^{i}}

M^{j} = {k \in [0, j] : t \to \infty lim x_{t}^{k} = x_{F}}

M^{j} = {k \in [0, j] : t \to \infty lim x_{t}^{k} = x_{F}}

Q^{j} (x) = ⎩ ⎨ ⎧ (i, t) \in F^{j} (x) min J_{t \to \infty}^{i} (x), + \infty if x \in SS^{j} if x \in / SS^{j}

Q^{j} (x) = ⎩ ⎨ ⎧ (i, t) \in F^{j} (x) min J_{t \to \infty}^{i} (x), + \infty if x \in SS^{j} if x \in / SS^{j}

F^{j}(x)=\Big{\{}(i,t):i\in[0,j],\ t\geq 0\ \text{with }x_{t}^{i}=x;\\ \forall x_{t}^{i}\in\mathcal{SS}^{j}

F^{j}(x)=\Big{\{}(i,t):i\in[0,j],\ t\geq 0\ \text{with }x_{t}^{i}=x;\\ \forall x_{t}^{i}\in\mathcal{SS}^{j}

x_{i} (t + 1) = f_{i} (x_{i} (t), u_{i} (t))

x_{i} (t + 1) = f_{i} (x_{i} (t), u_{i} (t))

U_{t \to t + N ∣ t} min

U_{t \to t + N ∣ t} min

+ ∥ x_{i} (k ∣ t) - x^{r e f} (k) ∥_{P_{i}^{2}}^{2}

+ ∥ u_{i} (k ∣ t) ∥_{P_{i}^{3}}^{2} + Q_{i} (x_{i} (t + N ∣ t))

x_{i} (k + 1∣ t) = f_{i} (x_{i} (k ∣ t), u_{i} (k ∣ t))

x_{i} (t ∣ t) = x_{i} (t)

γ_{i} (x_{i} (k ∣ t), x_{i - 1} (k ∣ t)) \geq Γ_{i} (k)

\underline{x}_{i} \leq x_{i} (k ∣ t) \leq \overline{x}_{i}

\underline{u}_{i} \leq u_{i} (k ∣ t) \leq \overline{u}_{i}

\forall k \in {t, \dots, t + N}

γ_{i} (x_{i} (k), x_{i - 1} (k)) = {\infty \frac{d _{i} ( k )}{v _{i} ( k ) - v _{i - 1} ( k )} v_{i - 1} (k) \geq v_{i} (k) v_{i - 1} (k) < v_{i} (k)

γ_{i} (x_{i} (k), x_{i - 1} (k)) = {\infty \frac{d _{i} ( k )}{v _{i} ( k ) - v _{i - 1} ( k )} v_{i - 1} (k) \geq v_{i} (k) v_{i - 1} (k) < v_{i} (k)

X_{i} (t) = [x_{i} (t + 1∣ t), x_{i} (t + 2∣ t), ..., x_{i} (t + N ∣ t)] U_{i} (t) = [u_{i} (t ∣ t), u_{i} (t + 1∣ t), ..., u_{i} (t + N - 1∣ t)]

X_{i} (t) = [x_{i} (t + 1∣ t), x_{i} (t + 2∣ t), ..., x_{i} (t + N ∣ t)] U_{i} (t) = [u_{i} (t ∣ t), u_{i} (t + 1∣ t), ..., u_{i} (t + N - 1∣ t)]

X_{i - 1}^{*} (t) = [x_{i - 1}^{*} (t ∣ t), x_{i - 1}^{*} (t + 1∣ t), \dots, x_{i - 1}^{*} (t + N ∣ t)]

X_{i - 1}^{*} (t) = [x_{i - 1}^{*} (t ∣ t), x_{i - 1}^{*} (t + 1∣ t), \dots, x_{i - 1}^{*} (t + N ∣ t)]

X^{0} (t) = [x^{0} (t + 1∣ t), x^{0} (t + 2∣ t), \dots, x^{0} (t + N ∣ t)]

X^{0} (t) = [x^{0} (t + 1∣ t), x^{0} (t + 2∣ t), \dots, x^{0} (t + N ∣ t)]

U^{0} (t) = [u^{0} (t ∣ t), u^{0} (t + 1∣ t), \dots, u^{0} (t + N - 1∣ t)]

\mathcal{DS}^{L}=\Bigg{\{}\bigcup_{k=1}^{N}\Big{\{}x^{L}(t+k|t)\ \Big{|}\ x_{k}^{L}\in\mathcal{X}^{\mathrm{d}_{t}},\\ u_{k}^{L}\in\mathcal{U}\ \forall k\in\{t,t+1,\dots,t+N\}\Big{\}}\Bigg{\}}\cup\mathcal{DS}^{L-1}

\mathcal{DS}^{L}=\Bigg{\{}\bigcup_{k=1}^{N}\Big{\{}x^{L}(t+k|t)\ \Big{|}\ x_{k}^{L}\in\mathcal{X}^{\mathrm{d}_{t}},\\ u_{k}^{L}\in\mathcal{U}\ \forall k\in\{t,t+1,\dots,t+N\}\Big{\}}\Bigg{\}}\cup\mathcal{DS}^{L-1}

q_{i}^{ℓ} (x_{i}^{ℓ} (k ∣ t) ∣ x_{i - 1}^{*} (k \to N ∣ t)) = z_{i} (x_{i}^{ℓ} (k ∣ t), u_{i}^{ℓ} (k ∣ t)) + q_{i}^{ℓ} (x_{i}^{ℓ} (k + 1∣ t) ∣ x_{i - 1}^{*} (k + 1 \to N ∣ t))

q_{i}^{ℓ} (x_{i}^{ℓ} (k ∣ t) ∣ x_{i - 1}^{*} (k \to N ∣ t)) = z_{i} (x_{i}^{ℓ} (k ∣ t), u_{i}^{ℓ} (k ∣ t)) + q_{i}^{ℓ} (x_{i}^{ℓ} (k + 1∣ t) ∣ x_{i - 1}^{*} (k + 1 \to N ∣ t))

J_{t + τ \to t + τ + ν}^{L} (x_{i}^{L} (t + τ)) = k = t + τ \sum t + τ + ν - 1 ∥ x_{i} (k + 1∣ t) - x_{i - 1}^{*} (k + 1∣ t) ∥_{P_{i}^{1}}^{2} + ∥ u_{i} (k ∣ t) ∥_{P_{i}^{2}}^{2} + ℓ = 0 \sum L - 1 η = 0 \sum N ζ_{i}^{ℓ} (η) q_{i}^{ℓ} (x_{i}^{ℓ} (t + η ∣ t) ∣ x_{i - 1}^{*} (η \to N ∣ t))

J_{t + τ \to t + τ + ν}^{L} (x_{i}^{L} (t + τ)) = k = t + τ \sum t + τ + ν - 1 ∥ x_{i} (k + 1∣ t) - x_{i - 1}^{*} (k + 1∣ t) ∥_{P_{i}^{1}}^{2} + ∥ u_{i} (k ∣ t) ∥_{P_{i}^{2}}^{2} + ℓ = 0 \sum L - 1 η = 0 \sum N ζ_{i}^{ℓ} (η) q_{i}^{ℓ} (x_{i}^{ℓ} (t + η ∣ t) ∣ x_{i - 1}^{*} (η \to N ∣ t))

x_{i} (t + τ + ν ∣ t) = ℓ = 0 \sum L - 1 η = 0 \sum N ζ_{i}^{ℓ} (η) x_{i}^{ℓ} (t + η ∣ t)

x_{i} (t + τ + ν ∣ t) = ℓ = 0 \sum L - 1 η = 0 \sum N ζ_{i}^{ℓ} (η) x_{i}^{ℓ} (t + η ∣ t)

ℓ = 0 \sum L - 1 η = 0 \sum N ζ_{i}^{ℓ} (η) = 1

ζ_{i}^{ℓ} (η) \in {0, 1}, \forall ℓ = {0, .., L - 1}, \forall η = {0, ..., N}

\mathpzc X_{i} (t + τ) = [\mathpzc x_{i} (t + τ + 1), \mathpzc x_{i} (t + τ + 2), ..., \mathpzc x_{i} (t + τ + ν)] \mathpzc U_{i} (t + τ) = [\mathpzc u_{i} (t + τ), \mathpzc u_{i} (t + τ + 1), ..., \mathpzc u_{i} (t + τ + ν - 1)]

\mathpzc X_{i} (t + τ) = [\mathpzc x_{i} (t + τ + 1), \mathpzc x_{i} (t + τ + 2), ..., \mathpzc x_{i} (t + τ + ν)] \mathpzc U_{i} (t + τ) = [\mathpzc u_{i} (t + τ), \mathpzc u_{i} (t + τ + 1), ..., \mathpzc u_{i} (t + τ + ν - 1)]

\mathpzc x_{i} (t + τ + ν) = x_{i}^{ℓ^{*}} (t + η^{*} ∣ x_{i - 1}^{*} (η^{*} \to N ∣ t))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Learning Model Predictive Control for Connected Autonomous Vehicles

††thanks: The authors are with the University of Virginia. {hj2bh,cf5eg}@virginia.edu. This work was partially supported by the NSF under grants CPS-1739333.

Hassan Jafarzadeh and Cody Fleming

Abstract

A Learning Model Predictive Controller (LMPC) is presented and tailored to platooning and Connected Autonomous Vehicles (CAVs) applications. The proposed controller builds on previous work on nonlinear LMPC, adapting its architecture and extending its capability to (a) handle dynamic environments and (b) account for data-driven decision variables that derive from an unknown or unknowable function. The paper presents the control design approach, and shows how to recursively construct an outer loop candidate trajectory and an inner iterative LMPC controller that converges to an optimal strategy over both model-driven and data-driven variables. Simulation results show the effectiveness of the proposed control logic.

Index Terms:

Learning, Model Predictive Control, LMPC, Data-driven Control, Connected Vehicles, Autonomous Vehicles

I Introduction

Recent advances in autonomous driving are becoming increasingly ubiquitous, and Advanced Driving Assistance Systems (ADAS) have the potential to improve safety and comfort in various driving conditions. In addition, the use of wireless communication networks could enable autonomous vehicles to reach high performance states that would not otherwise be feasible or safe (closer distances, higher speeds, etc). However, there is currently no methodology for providing provable safety guarantees for such states.

The main challenge is to capture the interdependence between mobility, wireless, and safety: a vehicle’s motion has a profound effect on the wireless channel, which in turn affects the ability to maintain physical safety. Many current efforts in the broader context of networked multi-agent systems [1, 2, 3, 4] assume a stationary channel: future properties of the wireless channel will be similar to those of the past. However, this assumption ignores the effect of mobility on the wireless channel, and so the controller cannot choose a motion plan that would improve safety (and, as a consequence, performance) by improving the wireless channel. In this paper, We explore these notions in the context of so-called vehicle platooning.

Adaptive cruise control (ACC) is a widely used ADAS module that controls the vehicle longitudinal dynamics. ACC is triggered once a preceding vehicle is detected within a certain distance range from the ego vehicle. ACC automatically maintains a proper minimum safe distance from preceding vehicles by automatically adjusting braking and acceleration. ACC enhances mobility, improves safety and comfort, and reduces energy consumption. The use of Model Predictive Control (MPC) for ACC applications is becoming increasingly common in the literature [5, 6].

The vehicle platoon control problem, or so-called collaborative adaptive cruise control (CACC) [7], has been widely studied in the literature and several solutions have been proposed [8, 9, 10] and is a natural extension of ACC that leverages vehicular ad hoc networks and vehicle to vehicle (V2V) communication. This problem has been well studied in the context of MPC control strategies [11, 12, 13], which have a natural advantage of using the predictive nature of MPC and then sharing these predictions over the wireless channel, in order to improve overall system performance.

However, these works tend to make the same assumptions about wireless communication mentioned above. These assumptions generally fall into one of two categories: (1) perfect communication, i.e. at every time step of the model predictive controller, the vehicle gets new trajectory predictions from vehicles in the platoon; or (2) imperfect communication where the performance is stationary, and so a vehicle will receive a new packet with a known (stationary) probability.

Environmental conditions such as a bridge overpass, buildings, or other vehicles impact the quality of the communication channel. Many of these conditions involve multipath reflection, for example an adjacent semi-truck might cause multipath interference between two communicating vehicles. In addition, the channel is affected by the trajectory of the vehicle itself; moving the transceivers closer or farther apart might positively (or negatively) impact multipath reflection.

This paper leverages recent and ongoing advances in wireless channel predictive capability, where knowledge of the scene can be used to predict what will happen to the channel [14, 15, 16, 17]. The related literature uses “black box” or “grey box” approaches [18, 19], where the function used to predict wireless properties is unknown (and possibly unknowable). State of the art techniques use some form of artificial neural network, or ANN [20]. The notion of a black box is common in the field of machine learning, particularly those techniques that use ANNs, but this presents a significant challenge in the context of safety broadly, and for MPC specifically.

In the following work, we leverage recent advances in data-driven MPC [21, 22, 23] that learn from previous iterations of a control task and provide guarantees on safety and improved performance at each iteration. In particular, we introduce a formulation of MPC for vehicle platooning that accounts for imperfect communication, and then design a LMPC control scheme that leverages the notion of predictive capability for wireless channel quality, in order to obtain better platoon performance. The contributions of this paper are:

formulation of a (L)MPC problem that can handle decision variables or objective functions that derive from an unknown or unknowable function, for example variables that are generated by an artificial neural net, and 2. 2.

extension of LMPC, adapted to handle dynamic environments and/or time-evolving constraints, in a computationally tractable manner.

The paper is organized as follows: section II provides preliminaries about LMPC and vehicle dynamics, and then section III presents the MPC model for vehicle platooning in which the communication channel is considered to be perfect and the packet is delivered to the following vehicle without any delay. Section IV then provides a modified LMPC for CAVs in three parts: section IV-A shows a tailored LMPC for dynamic environments, section IV-B takes into account the uncertainty of the communication channel, and section IV-C describes how the presented Mixed Integer Nonlinear Problem can be transformed to a Nonlinear Problem. In section V the simulation results of SR-LMPC for two connected vehicles are presented, and section VI makes concluding remarks.

II Preliminaries

This section is based on the original work of [21]. Beginning with a discrete time system

[TABLE]

where $x\in\mathbb{R}^{n}$ and $u\in\mathbb{R}^{m}$ are the system state and input, respectively, assume that $f(\cdot,\cdot)$ is continuous and that state and inputs are subject to the constraints

[TABLE]

LMPC solves the following infinite horizon optimal control problem iteratively:

[TABLE]

where equations (3a) and (3b) represent the system dynamics and the initial condition, and (3c) are the state and input constraints. LMPC assumes that the stage cost $h(\cdot,\cdot)$ in equation (3) is continuous and satisfies

[TABLE]

where the final state $x_{F}$ is a feasible equilibrium for the unforced system (1)

[TABLE]

At the $j^{\text{th}}$ iteration of LMPC, the vectors

[TABLE]

collect the inputs applied to system (1) and the corresponding state evolution. In (6), $x_{t}^{j}$ and $u_{t}^{j}$ denote the system state and the control input at time $t$ of the $j^{\text{th}}$ iteration. We assume that at each $j^{\text{th}}$ iteration, the closed loop trajectories start from the same initial state

[TABLE]

II-A Sampled Safe Set

A key notion of LMPC is that it exploits the iterative nature of control design. For every $k^{th}$ iteration that successfully steers the system to the terminal point $x_{F}$ (i.e., $\forall k:\lim_{t\to\infty}x_{t}^{k}=x_{F}$ ), the trajectory $\textbf{x}^{k}$ is a subset of sampled Safe Set $\mathcal{SS}^{j}$ , which is defined as:

[TABLE]

where

[TABLE]

$\mathcal{SS}^{j}$ is the collection of all state trajectories at iteration $i$ for $i\in M^{j}$ . $M^{j}$ in (9) is the set of indices $k$ associated with successful iterations of MPC for $k<j$ . It follows that $\mathcal{SS}^{i}\subseteq\mathcal{SS}^{j}\ \forall i\leq j$ . $\mathcal{SS}^{j}$ is a subset of the maximal stabilizable set because, for every point in the set, there exists a feasible control action that satisfies the state constraints and steers the state toward $x_{F}$ .

II-B Iteration Cost

Define a function $Q^{j}(\cdot)$ over $\mathcal{SS}^{j}$ that assigns to every point in $\mathcal{SS}^{j}$ the minimum cost-to-go along the trajectories in sampled safe set,

[TABLE]

where $F^{j}(\cdot)$ is

[TABLE]

In other words, for every $x\in\mathcal{SS}^{j}$ , $Q^{j}(x)$ not only assigns the optimal cost-to-go but also the pair $(i,t)$ that indicates the optimal iteration number in LMPC as well as the optimal time-to-go for that state.

II-C Properties of LMPC

It can be shown [21] that, using the above notions of sampled safe set and iteration cost, the $j^{\text{th}}$ iteration cost is nonincreasing at each iteration and that the LMPC formulation is recursively feasible (state and input constraints at the next iteration are satisfied they are satisfied at the current iteration). It should also be noted that LMPC solves the infinite time optimal control problem by solving at time $t$ of iteration $j$ a finite time constrained optimal control problem.

III Problem Formulation: MPC for CAVs

Consider that there are $\mathit{c+1}$ vehicles in the platoon, $i=\{0,1,\dots,c\}$ , where [math] refers to the lead car. The dynamical system of the $i^{th}$ vehicle is

[TABLE]

where, $x_{i}(t)\in\mathbb{R}^{n}$ and $u_{i}(t)\in\mathbb{R}^{m}$ show the state and control input vectors of the $i^{\text{th}}$ vehicle at time step $t$ , respectively. Also, assume that $f_{i}:\mathbb{R}^{n}\times\mathbb{R}^{m}\to\mathbb{R}^{n}$ is a smooth function that evolves the state of the vehicle $i$ through the time horizon, $N$ .

The proposed control architecture is formulated as model predictive control that repeatedly solves the following finite horizon optimization problem:

[TABLE]

where $x_{i}(k|t)$ and $u_{i}(k|t)$ are the state and control input at step $k$ predicted at time $t$ , respectively.

The objective function represents the tradeoff between tracking the preceding vehicle, reference tracking, and control effort with tuning matrices $P^{1}_{i}$ , $P^{2}_{i}$ , and $P^{3}_{i}$ , respectively. $Q_{i}$ represents the terminal cost. The first constraints (13a) and (13b) are the dynamical system of the vehicle and its initial state. The last two constraints (13d) and (13e) define the bounds on state variables and control inputs. The safety condition is enforced by inequality (13c) in which $\Gamma_{i}(t)$ is Time To Collision which is given as a parameter to the model:

[TABLE]

where $v_{i}(k)$ shows the velocity of vehicle $i$ and $d_{i}(k)$ is the euclidean distance between two vehicles $i-1$ and $i$ at time $k$ which can be calculated from their state vectors. The optimal solution to this problem is given by

[TABLE]

and a receding horizon control law applies the first control input $u_{i}(t|t)$ that transfers the system to the new state $x_{i}(t+1|t)$ , and the process is repeated from $t+1$ .

To solve this problem, it is necessary to assume that there is a feasible trajectory from the starting point $x_{i,s}$ to the goal point $x_{i,g}$ , which includes the feasibility of the model at time $t$ (i.e. the solution space of the MPC is not empty).

Like many existing works [24], the above MPC platooning control scheme assumes that the communication channel works perfectly, and that each vehicle has access to the (current) trajectory prediction of the preceding vehicle. We seek to relax this assumption. To take into account the uncertainty in the wireless channel, we adapt and extend LMPC for connected vehicles and platooning, which has previously been presented for a single vehicle in a static environment [21]. In the following sections, for the purpose of simplicity, only one iteration of model (13) is considered for developing the algorithm.

IV Learning Model Predictive Control for CAVs

The original LMPC formulation presented above is designed for an entire trajectory from an initial state to a final state, in a static environment. As those authors acknowledge, LMPC is computationally expensive; in addition, LMPC cannot be applied in a dynamic environment in its original form. To overcome these limitations, we propose two concepts. The first notion involves encoding the dynamics of obstacles both in the constraints and the objective function of the optimization problem. In addition, LMPC is reformulated from an end-to-end planning problem (i.e. find an optimal trajectory from $x_{S}\to x_{F}$ ), which typically has better performance in terms of the number of required iterations and degree of optimality, as the planning horizon, $N$ , increases [25]. In a dynamic context, a shorter planning horizon trades convergence to global optima with the ability to overcome computational issues with LMPC in general.

In addition, LMPC is extended to account for decision variables with unknown dynamics or data-driven approximations. The control architecture includes a (nominal) outer loop, or high level, motion planner that generates a candidate set of feasible trajectories. LMPC works on the inner loop to converge to a dynamically feasible trajectory with optimal (or improved) performance over data-driven decision variables. The approach is depicted graphically in Fig. 1 and pseudocode for this architecture is shown at the end of this section in Algorithm 1.

IV-A Short Range LMPC

Assumption 1

Assume that the preceding vehicle, $i-1$ , has converged on its own optimal trajectory and that this is known for the time horizon, $N$ :

[TABLE]

This assumption should be relaxed in future work, but (under) approximates an autonomous system’s ability to predict its own trajectory into the future and communicate this to the platoon. In section IV-B, the assumption on perfect communication will be relaxed and a different control strategy will be adopted.

Assumption 2

SR-LMPC assumes the existence of a feasible trajectory from the current state of the ego vehicle (i.e. the follower), $x(t)\to x(t+N)$ at the first iteration but with no assumptions on optimality, given as

[TABLE]

where $x^{0}(k|t)$ and $u^{0}(k|t)$ are the state and control input vectors of the ego system at time $k$ that have been calculated at time step $t$ . The superscript shows the iteration number of LMPC, which starts from [math] and is denoted by index $\ell=\{0,1,...,L\}$ . SR-LMPC keeps only a record of successful iterations in this set, with the total number of successful iterations completed by the algorithm given by $L$ . The information of each trajectory is saved in set $\mathcal{DS}$ if it is completed successfully, which implies no collisions (see (14)) but not necessarily optimality. Given an arbitrary, feasible initial trajectory stored in $\mathcal{DS}^{0}$ , the dynamic safe set is iteratively built as

[TABLE]

where $x\in\mathcal{X}^{\mathrm{d}_{t}}$ represents the dynamic constraints on state imposed by the time-to-collision for the predicted leader trajectory at time $t$ .

The cost-to-go of state $x(k|t)$ is denoted by $q(x(k|t)|x_{i-1}^{*}(k\to N|t))$ and is defined as “the trajectory cost of the ego system from $x(k|t)$ to $x(k+N|t)$ given that the states of the leading vehicle are $x_{i-1}^{*}(k|t),...,x_{i-1}^{*}(k+N|t)$ ”. A backward calculation is applied to find the cost-to-go for each state in set $\mathcal{DS}^{L}$ . To start, $q_{i}^{\ell}(x_{i}^{\ell}(N|t)|x_{i-1}^{*}(N|t))$ can be approximated by path planning algorithms [26] or simply assumed 1-norm distance between two vehicles at time $N$ .

[TABLE]

for $k=\{N-1,...,t\}$ , and where $z_{i}(x_{i}^{\ell}(k|t),u_{i}^{\ell}(k|t))$ is the stage cost and is defined as the cost of control effort to transfer the state of the system from $x_{i}^{\ell}(k|t)$ to $x_{i}^{\ell}(k+1|t)$ by applying input $u_{i}^{\ell}(k|t)$ at iteration $\ell$ and time $k$ . The overall performance of the controller at iteration $\ell$ occurs when $k=t$ .

The SR-LMPC formulation for vehicles in a platoon takes the form of the following constrained mixed integer optimization problem (MINLP). SR-LMPC uses the generic form of problem (13), with two notable exceptions. First the time horizon is modified, and significantly for the purposes of computation shortened: $\{t+\tau,t+\tau+1,...,t+\tau+\nu\}$ , where $\nu<N$ is the SR-LMPC time horizon and $t+\tau$ is the starting time, for all $\tau=\{0,1,...,N-\nu\}$ .

The objective function is modified to

[TABLE]

All of the constraints (13a)-(13e) hold, given appropriate sets $k\in\{t+\tau,\dots,t+\tau-\nu\}$ .

Finally, SR-LMPC then adds the following constraints

[TABLE]

The following vectors show the optimal solution for this model:

[TABLE]

This model assigns a binary decision variable, $\zeta_{i}^{\ell}(\eta)$ , to each of the states in $\mathcal{DS}^{L}$ (18) and selects one of them as the terminal state, (21a). Because only one of these states can be chosen as the terminal state $x_{i}(t+N|t)$ , the summation of the binary variables should be exactly one, as shown in (21b). The last term of the objective function (20) determines the best value for cost-to-go $q_{i}^{\ell}(x_{i}^{\ell}(t+\eta|t)|x_{i-1}^{*}(\eta\to N|t))$ and its associated state. Assuming that the optimal state in set $\mathcal{DS}^{L}$ is $x_{i}^{\ell^{\ast}}(t+\eta^{\ast}|x_{i-1}^{*}(\eta^{\ast}\to N|t))$ , the solution for system state is the vector

[TABLE]

After finding the optimal solution for SR-LMPC as an MINLP model, the first step of the control input vector, $\mathpzc{U}_{i}(t+\tau)$ , is implemented and the related state and control vectors are saved in the trajectory of iteration $L$ :

[TABLE]

The updated trajectory in the current iteration, $L$ , is as follows (notice the slightly different symbol for $\mathrm{x}$ and $\mathrm{u}$ , indicating the first step of the receding horizon LMPC along the time-shift $\tau$ ):

[TABLE]

for $\tau=\{0,1,...,N-\nu\}$ .

IV-B SR-LMPC with Uncertain Communication Channel

The formulation in IV-A assumes no delay in the communication channel and the following vehicle receives the lead vehicle’s motion plan for the current time horizon. However, the communication channel can be affected by numerous factors such as relative states of the communicating vehicles, adjacent buildings and vehicles, and so on. The quality of the communication channel is defined by Packet Delivery Rate, $PDR_{t}$ , and its inverse at time $t$ provides an estimate of delivery time for each packet $\omega_{t}^{\ell}=\frac{1}{PDR_{t}^{\ell}}$ . This estimate can be provided by an unknown function, e.g. an artificial neural network like an LSTM [20]. The radio icons in Fig. 2 represent the LSTM’s mapping from vehicle state to channel estimation.

At iteration $\ell$ and time step $t$ , the delivery time in range $j\cdot dt\leqslant\omega_{t}^{\ell}<(j+1)dt$ , where $j\in\mathbb{Z}^{\geqslant}$ , and $dt$ is the time constant used for the discretized dynamics in the general MPC formulation. This model of communication implies that the vehicle $i$ should rely on the most recent packet available at time $t-dt$ to calculate its motion policy. Assume that vehicle $i$ has received several packets at time step $t$ from the $i-1$ lead vehicle, which contain the trajectory predictions from whatever time stamp they were sent: $\{\mathrm{\textbf{X}}_{i-1}^{*}(t_{1}),\mathrm{\textbf{X}}_{i-1}^{*}(t_{2}),,...,\mathrm{\textbf{X}}_{i-1}^{*}(t_{m})\}$ , where $t_{1}<t_{2}<...<t_{m}\leqslant t$ . Because the packet containing $\mathrm{\textbf{X}}_{i-1}^{*}(t_{m})$ is the most up-to-date (though not necessarily current), it is used to calculate the motion policy of the following vehicle.

[TABLE]

However, the first $t-t_{m}$ states are stale, i.e. the leader has already executed these states, and should be deleted from the packet. Therefore, the packet shrinks to:

[TABLE]

The length of this packet determines the real time horizon that the following vehicle can consider:

[TABLE]

where, $\mathrm{N}_{i}(t)$ denotes the length of time horizon at time $t$ , $\mathrm{N}_{i}(t)\leqslant N$ . Fewer useful states may cause vehicle $i$ to take more conservative policies to satisfy the constraints, and this results in more cost for the entire whole trajectory; i.e. the controller might find local optima due to lack of longer-term information (see results in section V).

One of the key innovations of our approach is adding awareness of the cost to physical system performance due to communication delays. Therefore, the updated formulation includes communication delay, $\omega_{t}^{\ell}$ , in the objective function to penalize the trajectory that results in low quality (predicted) communication channel. Thus, the optimality condition forces the model to generate a trajectory in which the quality of the communication achieves an acceptable level, to avoid extra cost in the objective while satisfying the constraints. The vector of packet delivery time for whole time horizon is

[TABLE]

We assume that this vector is calculated and given by another system that is outside the scope of this work. One major challenge is that the delivery time of the packets cannot be calculated before determining the trajectory, as communication depends on the relative location of vehicles and characteristics of their surrounding environment in different time steps. Also, after generating a trajectory, $\mathrm{\Omega}_{i-1,i}^{L}(t)$ is calculated outside of the control algorithm (e.g. through methods referenced in section I), and because the model has not considered this variable in finding the solution, $\mathrm{N}_{i}(t)$ does not necessarily have an optimal value in following time steps. This can result in increases in the cost function.

To solve this problem, we exploit the repetitive nature of LMPC to consider $\mathrm{\Omega}_{i-1,i}^{L}(t)$ as a data-driven decision variable in the optimization process. This is done by two modifications in the model. First, we update $q_{i}^{\ell}$ in the objective function to include the cost associated with the delay in the communication channel if a specific terminal state is chosen. Observe that the cost-to-go vector is updated after a complete trajectory is generated. Second, a dynamic constraint is added to the model to represent the area where the communication loss occurs and is updated at each inner SR-LMPC iteration. For the first part, we define the following cost function

[TABLE]

If $\alpha$ is close to zero, the cost function will assign more weight to the most recent values, but if it is close to one, the weight of delivery time at all time steps will receive almost equal weight. Now the cost-to-go in the original objective function (20) can be reformulated as:

[TABLE]

The updated cost-to-go contains the cost of delivery time of communication channel and by replacing it in the objective function, the model will chose a terminal state that has lower cost in terms of communication channel, $\mathrm{\Omega}_{i-1,i}^{L}(t)$ , and stage costs, $z_{i}$ .For the second part, a boundary on the state vector is defined as a dynamic constraint in the model

[TABLE]

Note that this is a time-variant constraint and it depends on the states of the lead and ego vehicles. $\mathcal{O}^{\mathrm{d}_{t}}_{i}$ is updated based on the vector of packet delivery time for SR-LMPC which is shown by

[TABLE]

For simplicity, assume that at each iteration of Branch and Bound relaxation, the algorithm solves a convex quadratic model. Using the Interior Point Method, the computational complexity of finding $\epsilon-$ scale optimum for a quadratic model is polynomial in the size of model ( $n^{\prime}$ ) and required accuracy ( $\epsilon$ ), i.e. $O(n^{\prime}log1/\epsilon)$ [27] .However, the worst-case number of iterations of B&B algorithm is exponential $O(2^{(L-1)N})$ , where $(L-1)N$ is the number of binary variables assigned to the vector $Q_{i}$ . The size of model with time horizon $N$ is $(n+M)N$ , resulting in computational complexity of $O(2^{(L-1)N}(n+m)Nlog1/\epsilon)$ . The exponential part is dominant and yields in $O(2^{LN})$ . On the other hand, the proposed method with smaller time horizon $\nu$ has time complexity of $O(2^{(L-1)N}(n+m)\nu(N-\nu)log1/\epsilon)$ at each outer loop iteration, which again yields $O(2^{LN})$ . Then, the time complexity of the algorithm increases exponentially in the number of outer loop iterations, $L$ . Without impacting computational complexity, shortening the time horizon from $N$ to $\nu$ enables the algorithm to utilize data at smaller but more frequent steps (inner SR-LMPC iterations) to improve the current trajectory by exploring the solution space with greater coverage. This approach noticeably decreases the number of overall trajectories, $L$ , needed to converge to the optimum.

IV-C Converting MINLP to Nonlinear Problem

The formulation presented in IV-B computationally expensive to solve. The binary variables $\mathrm{\zeta}_{i}^{\ell}(\eta)$ and nonlinear constraints make the model Mixed Integer Nonlinear Programming (MINLP), which should be avoided in pursuit of algorithms that scale and/or can be applied in real time. Translating the above MINLP problem into the following Nonlinear Programming (NLP) formulation makes computing the optimal control policy more tractable. The control algorithm for SR-LMPC solves the following problem

[TABLE]

The difference in this model to the original SR-LMPC formulation (see constraints (21a) - (21c)) is the values that $\mathrm{\zeta}_{i}^{\ell}(\eta)$ can take as a decision variable.Although constraint (34g) allows $\mathrm{\zeta}_{i}^{\ell}(\eta)$ to take values from zero to one, constraint (34f) limits them to be just one or zero. On the other hand, constraint (34e) enforces them to be all zero except for one that should be valued one.

The overall SR-LMPC approach is shown in Algorithm 1.

V Example Scenario - Uncertain Communication

The application of a controller that adapts the vehicle’s longitudinal velocity based on the other vehicle’s states, based on uncertain communication over a V2V network, is not restricted to the car-following or platooning scenario. Similar formulations of the platooning or leader-follower control scheme from section IV can be extended to other applications like autonomous intersection control and improve either decentralized or centralized approaches [28, 29].

However, the scenario developed for this paper focuses on a leader-follower scenario, where communication performance is influenced by a bridge overpass, which is known to have a negative effect [30, 31, 32]. The leading vehicle follows a simple trajectory with constant velocity. The following vehicle has stable initial conditions, i.e. before entering the bridge area it has converged to an optimal trajectory in terms of the tradeoff between following distance, control cost, and communication (we assume that communication performance is stable outside of the bridge scenario).

As the vehicles approach the bridge, the channel will begin to experience packet drops if the relative states achieve certain characteristics, for example combinations of relative speed and distance, due to multi-path interference with the bridge surface. It is also assumed that this deterioration in channel performance can be accurately predicted over time horizon $N$ . In this simplified scenario, it is assumed that communication will drop out if both vehicles are in the so-called “dead zone” at the same time. One of the terms formulated in the objective function of the ego vehicle is avoiding states that cause communication loss (not entering into the dead zone) while satisfying other terms. Note that even in this simple scenario, SR-LMPC does not have access to this information in terms of a closed-form function in either the objective or constraints. Rather, at each (predicted or current) time step, an offline data-driven function is queried and the predictions are returned.

Finally, the vehicle dynamics are formulated as a point mass system such that $\dot{x}(t)=u(t)$ , subject to input saturation. Each of these assumptions should be relaxed in future work, such as imperfect prediction accuracy and Dubins path dynamics. Relevant parameters of the scenario and model are shown in Table I.

The optimization problems in (20) and (34) are solved using CPLEX [33]. We seek to characterize whether, and to what extent, system performance can be improved with a “look ahead” function that can predict channel performance by means of a black box method. Results are shown in Fig. 3 and 4.

It should be noted that the nominal scheme without SR-LMPC under communication uncertainty and/or prediction (section IV-A and IV-B) proceeds agressively towards the lead car and eventually ends in the red region of Fig. 3 at the same time, resulting in significant packet loss and increased braking. With SR-LMPC, in early iterations the controller attempts to avoid this situation but saturates the control inputs. As it iterates, SR-LMPC begins to avoid saturation, or it saturates for a shorter amount of time. Overall, SR-LMPC is able to minimize fuel cost while also maintaining closer headway (top of Fig. 3 as well as bottom) and constant communication.

The intuition behind these results is as follows. Without the ability to predict communication performance, the following vehicle attempts to optimize on following distance and terminal cost. Once it enters the dead zone, it starts to drop packets and can only use increasingly shorter portions of (previously communicated) leader trajectory predictions. In the case of SR-LMPC with communication prediction, the algorithm is able to access a generic, unknown function that tells it whether the channel is expected to change in the future. SR-LMPC is able to converge on a trajectory that smoothly slows down in advance of the dead zone, which results in significantly improved global performance.

VI Conclusions

In this paper, an extension to learning Model Predictive Control (LMPC) is presented. The controller is designed for applications to motion planning in dynamic environments, particularly when one or more of the decision variables comes from black box or data-driven models. The control architecture leverages a nominal outer loop motion planner, and then iterates over this trajectory candidate in an inner loop to find optimal policies in terms of both model- and data-driven variables. This outer and inner control scheme proceeds in a receding horizon fashion until the system reaches its objectives. These concepts are applied to connected autonomous vehicles and the notion of platooning, or collaborative adaptive cruise control.

To demonstrate the approach, a simulation of a leader-follower scenario for two connected autonomous vehicles is developed. The scenario includes physical characteristics that cause uncertainty in the communication channel, and the controller leverages recent advances in wireless channel prediction using machine learning. The SR-LMPC framework is able to generate improved trajectories in terms of not only communication, but also energy efficiency.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. Nowzari and J. Cortés, “Distributed event-triggered coordination for average consensus on weight-balanced digraphs,” Automatica , vol. 68, pp. 237–244, 2016.
2[2] D. V. Dimarogonas, E. Frazzoli, and K. H. Johansson, “Distributed event-triggered control for multi-agent systems,” IEEE Transactions on Automatic Control , vol. 57, no. 5, pp. 1291–1297, 2012.
3[3] L. Wang, A. D. Ames, and M. Egerstedt, “Multi-objective compositions for collision-free connectivity maintenance in teams of mobile robots,” in Decision and Control (CDC), 2016 IEEE 55th Conference on . IEEE, 2016, pp. 2659–2664.
4[4] L. Wang, A. Ames, and M. Egerstedt, “Safety barrier certificates for heterogeneous multi-robot systems,” in American Control Conference (ACC), 2016 . IEEE, 2016, pp. 5213–5218.
5[5] L.-h. Luo, H. Liu, P. Li, and H. Wang, “Model predictive control for adaptive cruise control with multi-objectives: comfort, fuel-economy, safety and car-following,” Journal of Zhejiang University SCIENCE A , vol. 11, no. 3, pp. 191–201, 2010.
6[6] D. Corona and B. De Schutter, “Adaptive cruise control for a smart car: A comparison benchmark for mpc-pwa control methods,” IEEE Transactions on Control Systems Technology , vol. 16, no. 2, pp. 365–372, 2008.
7[7] P. Varaiya, “Smart cars on smart roads: problems of control,” IEEE Transactions on automatic control , vol. 38, no. 2, pp. 195–207, 1993.
8[8] A. Iihoshi, S. Kobayashi, and Y. Furukawa, “Vehicle platoon control system,” Feb. 29 2000, u S Patent 6,032,097.