Learning How to Autonomously Race a Car: a Predictive Control Approach
Ugo Rosolia, Francesco Borrelli

TL;DR
This paper introduces a Learning Model Predictive Controller for autonomous racing that iteratively improves lap times by updating control strategies based on previous laps, reducing computational load and using system identification.
Contribution
It proposes a novel LMPC strategy that minimizes computational complexity and introduces a system identification method for autonomous racing control.
Findings
Reduced computational burden in LMPC
Successful experimental validation on Berkeley Autonomous Race Car
Improved lap times through iterative learning
Abstract
In this paper we present a Learning Model Predictive Controller (LMPC) for autonomous racing. We model the autonomous racing problem as a minimum time iterative control task, where an iteration corresponds to a lap. In the proposed approach at each lap the race time does not increase compared to the previous lap. The system trajectory and input sequence of each lap are stored and used to systematically update the controller for the next lap. The first contribution of the paper is to propose a LMPC strategy which reduces the computational burden associated with existing LMPC strategies. In particular, we show how to construct a safe set and an approximation to the value function, using a subset of the stored data. The second contribution is to present a system identification strategy for the autonomous racing iterative control task. We use data from previous iterations and the vehicle's…
| 20 | |
| diag | |
| diag | |
| 80 | |
| 10 | |
| 12 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Learning How to Autonomously Race a Car:
a Predictive Control Approach
Ugo Rosolia and Francesco Borrelli U. Rosolia and F. Borrelli are with the Department of Mechanical Engineering, University of California at Berkeley , Berkeley, CA 94701, USA {ugo.rosolia, fborrelli}@berkeley.edu
Abstract
In this paper we present a Learning Model Predictive Controller (LMPC) for autonomous racing. We model the autonomous racing problem as a minimum time iterative control task, where an iteration corresponds to a lap. The system trajectory and input sequence of each lap are stored and used to systematically update the controller for the next lap. In the proposed approach the race time does not increase at each iteration. The first contribution of the paper is to propose a local LMPC which reduces the computational burden associated with existing LMPC strategies. In particular, we show how to construct a local safe set and approximation to the value function, using a subset of the stored data. The second contribution is to present a system identification strategy for the autonomous racing iterative control task. We use data from previous iterations and the vehicle’s kinematic equations of motion to build an affine time-varying prediction model. The effectiveness of the proposed strategy is demonstrated by experimental results on the Berkeley Autonomous Race Car (BARC) platform.
I Introduction
Autonomous driving is an active research field. Over the past decades several techniques have been proposed for different driving scenarios [1, 2, 3, 4, 5, 6, 7, 8, 9]. Depending on the control task (i.e. highway driving, urban driving, emergency maneuvers) the behavior of the vehicle can be modelled with linear or nonlinear equations of motions [10], [11]. When the nonlinearities of the vehicle are excited the control task is inevitably more challenging. In this work we are interested in designing a controller for autonomous racing which can operate the vehicle in the nonlinear regime, close to the limit of the vehicle’s handling capability. We formulate the autonomous racing problem as an iterative control task, where at each iteration the controller drives the vehicle around the track trying to minimize the lap time.
Recently several approaches have been proposed for autonomous racing. In [12] the authors reformulated the autonomous racing control task as a non-convex optimization problem and then proposed a linearization strategy to compute an approximate solution. The authors in [13] proposed a Nonlinear Model Predictive Control (NMPC) strategy which exploits a Pacejka tire model identified form experimental data. The NMPC is implemented on an experimental set-up using an exact Hessian SQP-type optimization algorithm. NMPC strategies for autonomous racing are tested also in [14], where the authors compared two control methodologies based on different parametrizations of the vehicle’s model. In [15] the authors compared two approaches, the first one based on a tracking MPC and the second one based on a MPC formulated in a space dependent frame. A Model Predictive Contouring Control (MPCC) was presented in [16]. In MPCC the controller objective is a trade-off between the progress along the track and the contouring error. First, an high level MPC computes the optimal racing trajectory. Afterward, a low level controller is used to track the optimal racing line. This strategy is extended in [17] to design a racing controller which guarantees recursive constraint satisfaction. Also in [18] the control problem is divided in two steps. First, a reference trajectory is computed using the method proposed in [19]. Afterwards, an iterative learning control (ILC) approach is used for tracking. The authors showed the effectiveness of the proposed approach by experimental testing on a full size vehicle. We proposed to reformulate the autonomous racing problem as an iterative control task. The controller is not based on a precomputed racing line and it learns from experience a trajectory which minimizes the lap time. In particular, the closed-loop trajectories at each lap are stored and used to systematically update the controller for the next lap. This paper builds on [20, 21, 22] and has two main contributions.
The first contribution is to propose a local LMPC strategy where the terminal cost and constraint are updated at each time step. In particular at each time , we exploit the planned trajectory at time to construct a local terminal cost and constraint. Conversely to our previous works [20, 21, 22], the terminal cost and constraint are computed using a subset of the stored data, therefore the proposed local LMPC enables the reduction of computational burden associated with existing LMPC strategies. The effectiveness of the proposed approach is demonstrated on the Berkeley Autonomous Race Car (BARC)111A video of the experiment can be found at https://youtu.be/ZBFJWtIbtMo platform. We show that the proposed controller is able to improve the lap time, until it converges to a steady state behavior. Finally, we analyze the lateral acceleration acting on the closed-loop system and we confirm that the controller learns to drive the vehicle at the limit of its handling capability.
The second contribution of this work is to propose a system identification strategy tailored to the autonomous racing application. We propose to exploit both the kinematic equations of motion and data from previous iterations to identify an Affine Time Varying (ATV) prediction model used for control. In particular, we use a local linear regressor to learn the relationships between the inputs and the vehicle’s velocities. Furthermore, we linearize the kinematic equations of motion to approximate the evolution of the vehicle’s position as a function of the velocities. Conversely to our previous works [20, 21], this strategy allow us to reformulate the LMPC as a Quadratic Program (QP) which can be solved efficiently.
This paper is organized as follows: in Section II we introduce the problem formulation. Section III illustrates the LMPC design. In particular, it shows how to construct local safe sets and value function approximations using a subset of the collected data. Section IV illustrates the system identification strategy used in the experiments. Finally, in Section V we present the experimental results on the Berkeley Autonomous Race Car (BARC) platform. Section VIII provides final remarks.
II Problem Formulation
Consider the following state and input vectors
[TABLE]
where are the vehicle’s yaw rate, longitudinal and lateral velocities. The position of the vehicle is represented in the curvilinear reference frame [23], where is the distance travelled along the centerline of the track. The states and are the heading angle and lateral distance error between the vehicle and the centerline of the track, as shown in Figure 1. Finally, and are the steering and acceleration commands. The vehicle is described by the dynamic bicycle model
[TABLE]
where is derived from kinematics and balancing the forces acting on the tires [10]. A detailed expression can be found in [10, Chapter 2]. Note that in the curvilinear reference frame state and input constraints are convex, i.e.
[TABLE]
The goal of the controller is to drive the system from the starting point to the terminal set . More formally, the controller aims to solve the following minimum time optimal control problem
[TABLE]
where for a track of length the terminal set
[TABLE]
represents the states beyond the finish line.
III Controller Design
In this section, we first show how to use historical data to construct a terminal constraint set and terminal cost function. Afterwards, we exploit these quantities to design the controller.
III-A Stored Data
As stated in the introduction, we define one iteration as a successful lap around the race track and we store the closed-loop trajectories. In particular, at the th iteration we define the vectors
[TABLE]
which collect the evolution of closed-loop system and associated input sequence. In the above definitions, denotes the time at which the closed-loop system reached the terminal set, i.e. .
III-B Local Convex Safe Set
In this section, we define the local convex safe set. Differently from our previous works [22, 20, 21], this quantity is constructed using a subset of the stored data points. In particular, the local convex safe set around is defined as the convex hull of the -nearest neighbors to .
First, for the th trajectory we define the set of time indices associated with the -nearest neighbors to the point ,
[TABLE]
In the above definition for the user-defined matrix , which may be chosen to take into account the relative scaling or relevance of different variables. We chose to select the -nearest neighbors with respect to the curvilinear abscissa , which represents a proxy for the distance between two stored data points of the same lap. Furthermore, as the vehicle moves forward on the track, at each lap the stored data are ordered with respect to the travelled distance and the computation of (5) is simplified. The -nearest neighbors to from the th to the th iteration are collected in the following matrix
[TABLE]
which is used to define the local convex safe set around
[TABLE]
Notice that the above local convex safe set represents the convex hull of the -nearest neighbors to from the th to th iteration.
Finally, we define the matrix
[TABLE]
which collects the evolution of the states stored in the columns of the matrix . The above matrix will be used in Section III-D to construct the local convex safe set at each time step.
III-C Local Convex Q-function
In this section, we exploit the stored data to construct an approximation to the cost-to-go over the local convex safe set around . In particular, we define the local convex -function around as the convex combination of the cost associated with the stored trajectories,
[TABLE]
where , is a row vector of ones and the row vector
[TABLE]
collects the cost-to-go associated with the -nearest neighbors to from the th the th iteration. The cost-to-go represents the time to drive the vehicle from to the finish line along the th trajectory. We underline that the cost-to-go is computed after completion of the th iteration.
III-D Local LMPC Design
The local convex safe set and the local convex -function are used to design the controller. At each time of the th iteration the controller solves the following finite time optimal control problem
[TABLE]
where , and the stage cost in (8a)
[TABLE]
In the above finite time optimal control problem equations (8b), (8d) and (8e) represent the dynamic update, state and input constraints. Finally, (8c) enforces into the local convex safe set defined in Section III-B. The optimal solution to (8) at time of the th iteration
[TABLE]
is used to compute the following vector
[TABLE]
which at time defines the local convex safe set and local -function in (8). The above vector represents a candidate terminal state for the planned trajectory of the LMPC at time . First, we initialize the candidate terminal state using the th trajectory. Afterwards, we update the vector as the convex combination of the columns of the matrix from Section III-B. Notice that if the systems is linear or if a linearized system approximates the nonlinear dynamics over the local convex safe set, then there exists a feasible input which drives the system from to .
Finally, we apply to the system (1) the first element of the optimizer vector,
[TABLE]
The finite time optimal control problem (8) is repeated at time , based on the new state .
IV System Identification Strategy
In this section, we illustrate the system identification strategy used to build an Affine Time Varying (ATV) model which approximates the vehicle dynamics. First, we introduce the kinematic equations of motion which describe the evolution of the vehicle’s position as a function of the velocities. Afterwards, we present the strategy used to approximate the dynamic equations of motion, which model the evolution of the vehicle’s velocities as a function of the input commands. Finally, we describe the ATV model, which is computed online linearizing the kinematic equations of motion and evaluating the approximate dynamic equations of motion along the shifted optimal solution to the LMPC.
IV-A Kinematic Model
As mentioned in Section II, the position of the vehicle is expressed in the Frenet reference frame [23]. In particular, we describe the position of the vehicle in terms of lateral distance from the centerline of the road and distance traveled along a predefined path (Fig. 1). The state represents the difference between the vehicle’s heading angle and the angle of the tangent vector to the path at the curvilinear abscissa .
The rate of change of the vehicle’s position in the curvilinar reference frame is described by the following kinematic relationships
[TABLE]
where is the curvature of the centerline of the track at the curvilinear abscissa [23]. The above equations can be Euler discretized to approximate the vehicle’s motion as a function of the vehicle’s velocities
[TABLE]
where is the discretization time. The above equations will be linearized to compute an ATV prediction model. It is interesting to notice that equations (12) are independent of the vehicle’s physical parameters, because these are derived from kinematic relationships between velocities and position.
IV-B Dynamic Model
The dynamic equations of motion, which describe the evolution of the vehicle’s velocities, may be computed balancing the forces acting on the tires [10]. Therefore, the dynamic equations depend on physical parameters associated with the vehicle, tires and asphalt. These parameters may be estimated through a system identification campaign. However, the nonlinear dynamic equations of motion should be linearized in order to obtain an ATV model which allows us to reformulate the LMPC as a QP. Instead of identifying the parameters of a nonlinear model and then linearize it, we propose to directly learn a linear model around using a local linear regressor. We introduce the Epanechnikov kernel function [24]
[TABLE]
which is used to compute a local linear model around for the longitudinal and lateral dynamics. In particular, for we compute the following regressor vector
[TABLE]
where the hyperparameter is the bandwidth, the row vector ,
[TABLE]
and is the set of indices
[TABLE]
where and the matrix is user defined. For the stored data from iteration to iteration , the set collects the indices associated with the -nearest neighbors to the state . Finally, the user-defined matrix takes into account the relative scaling of different variables.
Notice that the optimizer in (13) can be used to approximate the evolution of vehicle’s velocities,
[TABLE]
where for the scalar denotes the th element of the vector and is a row vector collecting the first three elements of in (13).
IV-C Affine Time Varying Model
In this section we describe the strategy used to build an ATV model, which is then used for control. At time of the th iteration we define the candidate solution to Problem (8) using the optimal solution at time from (9),
[TABLE]
Finally at each time of iteration , the above candidate solution is used to build the following ATV model
[TABLE]
where and the matrices , and are obtained linearizing (12) around and evaluating (14) at ,
[TABLE]
and
[TABLE]
V Results
The proposed control strategy has been implemented on a 1/10-scale open source vehicle platform called Berkeley Autonomous Race Car222A video of the experiment can be found at https://youtu.be/ZBFJWtIbtMo (BARC). The vehicle is equipped with a set of sensors, actuators and two on-board CPUs to perform low-level control of the actuators as well as communication with a laptop, on which the high-level control is implemented. The CPUs are an Arduino Nano for low-level control of the actuators and an Odroid XU4 for WiFi communication with the i7 MSI GT72 laptop. The actuators are an electrical motor and a servo for the steering. The control architecture has been implemented in the Robot Operating System (ROS) framework, using Python and OSQP [25]. The code is available online333The code is available on the BARC GitHub repository in the “devel-ugo” branch (github.com/MPC-Berkeley/barc).
We initialize the algorithm performing two laps of path following at constant speed. Each th iteration collects the data of two consecutive laps. Therefore, the local safe set and local -function are defined also beyond the finish line. This strategy allows us to implement the LMPC for the repetitive autonomous racing control task, as shown in [20]. At each th lap, we use the LMPC (8) and (11) to drive the vehicle from the starting line to the finish line and we use the closed-loop data to update the controller for the next lap. The parameters which define the controller are reported in Table I. We also added a small input rate cost in order to guarantee a unique solution to the QP associated with the LMPC.
We tested the controller on an oval-shaped and L-shaped tracks on which the vehicle runs in the counter-clockwise direction. Figure 2 shows that the lap time decreases until convergence is reached after laps. Furthermore, Figure 4 shows the evolution of the closed-loop trajectory on the X-Y plane and the velocity profile which is color coded. In the first row we reported the path following trajectory used to initialize the LMPC and the closed-loop trajectories at laps and . We notice that the controller deviates from the initial feasible trajectory (reported in blue as the vehicle speed is m/s) in order to explore the state space and to drive the vehicle at higher speeds, until it converges to a steady-state behavior. The steady-state trajectories from lap to are reported in the bottom row of Figure 4. Notice that the color bar representing the velocity profile changed from the first to second row as the vehicle runs at higher speed at the end of the learning process. We underline that the controller understands the benefit of breaking right before entering the curve and of accelerating when exiting. This behavior is optimal in racing as shown in [26].
Figure 3 shows the raw acceleration measurements from the IMU. We confirm that controller is able to operate the vehicle at the limit of its handling capability, reaching a maximum lateral acceleration close to g 444The maximum allowed lateral acceleration is computed assuming that the aerodynamic effects are negligible and the that lateral force acting on the vehicle is for the friction coefficient ..
Furthermore, Figure 5 shows the data points used to design the LMPC. Recall from Table I that at the th lap the LMPC policy is synthesized using the trajectories from lap to lap . Therefore, as the controller drives faster on the track, less data points are needed to design the LMPC. Moreover, in Figure 6 we reported the computational time. It is interesting to notice that on average the finite time optimal control problem (8) is solved in less then ms, whereas it took ms to solve the finite time optimal control problem associated with [20]. We underline that both strategies have been tested with a prediction horizon of and a sampling time of Hz. This shows the advantage of using the local convex safe set in (6), instead of the polynomial approximation to the safe set used in [21, 20]. For more details on the polynomial approximation to the safe set we refer to [21]. Finally, we notice that it would be possible to parallelize the computation of the linear models which define the ATV model from (15). Indeed, at time Equations (16)-(17) may be evaluated independently and in parallel for each predicted time .
VI Conclusions
We presented a Learning Model Predictive Controller (LMPC) for autonomous racing. The proposed control framework uses historical data to construct safe sets and approximations to the value function. These quantities are systematically updated when a lap is completed, as a result the LMPC learns from experience to safely drive the vehicle at the limit of handling. We demonstrated the effectiveness of the proposed strategy on the Berkeley Autonomous Race Car (BARC) platform. Experimental results show that the controller learns to drive the vehicle aggressively, in order to minimize the lap time. In particular, the closed-loop system converged to a steady-state trajectory which cuts curves and reaches a lateral acceleration close to g.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. J. Rossetter and J. C. Gerdes, “Lyapunov based performance guarantees for the potential field lane-keeping assistance system,” Journal of dynamic systems, measurement, and control , vol. 128, no. 3, pp. 510–522, 2006.
- 2[2] Y. Gao, A. Gray, J. V. Frasch, T. Lin, E. Tseng, J. K. Hedrick, and F. Borrelli, “Spatial predictive control for agile semi-autonomous ground vehicles,” in 11th International Symposium on Advanced Vehicle Control , 2012.
- 3[3] Y. Kuwata, J. Teo, G. Fiore, S. Karaman, E. Frazzoli, and J. P. How, “Real-time motion planning with applications to autonomous urban driving,” IEEE Transactions on Control Systems Technology , vol. 17, no. 5, pp. 1105–1118, 2009.
- 4[4] J. V. Frasch, A. Gray, M. Zanon, H. J. Ferreau, S. Sager, F. Borrelli, and M. Diehl, “An auto-generated nonlinear mpc algorithm for real-time obstacle avoidance of ground vehicles,” in Control Conference (ECC), 2013 European . IEEE, 2013, pp. 4136–4141.
- 5[5] M. Campbell, M. Egerstedt, J. P. How, and R. M. Murray, “Autonomous driving in urban environments: approaches, lessons and challenges,” Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences , vol. 368, no. 1928, pp. 4649–4672, 2010.
- 6[6] D. González, J. Pérez, V. Milanés, and F. Nashashibi, “A review of motion planning techniques for automated vehicles,” IEEE Transactions on Intelligent Transportation Systems , vol. 17, no. 4, pp. 1135–1145, 2016.
- 7[7] C. Katrakazas, M. Quddus, W.-H. Chen, and L. Deka, “Real-time motion planning methods for autonomous on-road driving: State-of-the-art and future research directions,” Transportation Research Part C: Emerging Technologies , vol. 60, pp. 416–442, 2015.
- 8[8] B. Paden, M. Čáp, S. Z. Yong, D. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques for self-driving urban vehicles,” IEEE Transactions on Intelligent Vehicles , vol. 1, no. 1, pp. 33–55, 2016.
