Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System
Ugo Rosolia, Francesco Borrelli

TL;DR
This paper introduces an efficient Learning Model Predictive Control method for linear systems that learns from previous iterations to improve performance while reducing computational complexity.
Contribution
It extends existing LMPC frameworks with a new approach that simplifies computation and guarantees stability using convex safe sets and terminal costs.
Findings
Effective in reducing computational load
Ensures recursive feasibility and performance improvement
Validated through simulation results
Abstract
A Learning Model Predictive Controller (LMPC) for linear system in presented. The proposed controller is an extension of the LMPC [1] and it aims to decrease the computational burden. The control scheme is reference-free and is able to improve its performance by learning from previous iterations. A convex safe set and a terminal cost function are used in order to guarantee recursive feasibility and non-increasing performance at each iteration. The paper presents the control design approach, and shows how to recursively construct the convex terminal set and the terminal cost from state and input trajectories of previous iterations. Simulation results show the effectiveness of the proposed control logic.
| Iteration Iteration Cost | ||
|---|---|---|
| Iterations | ||||
|---|---|---|---|---|
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 3 | ||||
| 3 | ||||
| 4 | ||||
| 2 | ||||
| 3 | ||||
| 4 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System
Ugo Rosolia
Francesco Borrelli
University of California at Berkeley , Berkeley, CA 94701, USA (e-mail: {ugo.rosolia, fborrelli}@berkeley.edu).
Abstract
A Learning Model Predictive Controller (LMPC) for linear system is presented. The proposed controller builds on previous work on nonlinear LMPC and decreases its computational burden for linear system. The control scheme is reference-free and is able to improve its performance by learning from previous iterations. A convex safe set and a terminal cost function are used in order to guarantee recursive feasibility and non-increasing performance at each iteration. The paper presents the control design approach, and shows how to recursively construct the convex terminal set and the terminal cost from state and input trajectories of previous iterations. Simulation results show the effectiveness of the proposed control logic.
keywords:
Learning, Model Predictive Control, LMPC, Convex Optimization
1 INTRODUCTION
Iterative Learning Control (ILC) studies control design for autonomous systems performing repetitive tasks Bristow et al. (2006); Lee and Lee (2007); Wang et al. (2009). One task execution is often referred to as “iteration” or “trial”. In ILC, at each iteration, the system starts from the same initial condition and the controller objective is to track a given reference, rejecting periodic disturbances Bristow et al. (2006); Lee and Lee (2007). The tracking error from the previous iterations is used to improve the tracking performance of the closed loop system. Different strategies have been proposed to guarantee zero tracking error of the closed loop system Bristow et al. (2006); Lee and Lee (2007); Wang et al. (2009).
Several control frameworks which combine ILC and MPC strategies have been proposed in literature, Subbaraman and Benosman (2016); Lee and Lee (2000); Lee et al. (2000). In the classical ILC approach the goal of the controller is to track a reference trajectory, however, in some application such has autonomous racing Sharp and Peng (2011); Rucco et al. (2015) or for some manipulation tasks Tamar et al. (2016), it may be challenging to generate a priori a reference trajectory that maximize the system performance. For this reason, a very recent work Tamar et al. (2016) proposed a reference-free ILC scheme. The authors used a MPC controller with a terminal cost that allows to consider the long term planning. This terminal cost is computed using a neural network trained on data generated by offline simulations. The authors were able to improve the system performance over iterations. However, no guaranties about stability, recursive feasibility and performance improvement are provided.
Our objective is to design a reference-free iterative control strategy for linear system able to learn from previous iterations. At each iteration, the initial condition, the constraints and the objective function do not change. The -th iteration cost is defined as the objective function evaluated for the realized closed loop system trajectory. The iteration cost shall not increase over the iterations and state and input constraints shall be satisfied. Model Predictive Control is an appealing technique to tackle this problem for its ability to handle state and inputs constraints while minimizing a finite-time predicted cost Garcia et al. (1989). However, the receding horizon nature can lead to infeasibility and it does not guaranty improved performance at each iteration Mayne et al. (2000).
The contribution of this paper is the following. We present an extension to the learning MPC for iterative control task in Rosolia and Borrelli (2017). In particular, we introduce a new formulation for linear system that drastically reduces the computation burden of the controller without compromising the guaranties of the learning MPC. We show how to design a convex safe set and a terminal cost function in order to guarantee: (i): [asymptotic stability], the closed loop system converges asymptotically to the equilibrium point, (ii): [persistent feasibility], state and input constraints are satisfied if they were satisfied at iterations (iii): [performance improvement], the -th iteration cost does not increase compared with the --th iteration cost, (iv): [global optimality], if the steady state system converges to a closed-loop trajectory as the number of iterations goes to infinity, then that closed-loop trajectory is globally optimal. We emphasize that (i)-(ii) are standard MPC design requirement and (iii)-(iv) are the core contribution of this work.
This paper is organized as follows: in Section II we introduce the notation used throughout the paper. Then, we define the convex safe set and the terminal cost function used in the design of the learning MPC. Section III describes the control design. We show the recursive feasibility and stability of the control logic and, afterwards, we prove the convergence properties. Finally, in Section IV we test the proposed control logic on an infinite horizon linear quadratic regulator and we compare the computational efficiency with the learning MPC from Rosolia and Borrelli (2017).
2 PROBLEM FORMULATION
Consider the discrete time system
[TABLE]
where and are the system state and input, respectively, subject to the constraints
[TABLE]
where and are convex sets.
At the -th iteration the vectors
[TABLE]
collect the inputs applied to system (1) and the corresponding state evolution. In (3), and denote the system state and the control input at time of the -th iteration, respectively. We assume that at each -th iteration the closed loop trajectories start from the same initial state,
[TABLE]
The goal is to design a controller which solves the following infinite horizon optimal control problem at each iteration:
[TABLE]
where equations (5b) and (5c) represent the system dynamics and the initial condition, and (5d) are the state and input constraints. The stage cost, , in equation (5a) is continuous, jointly convex and it satisfies
[TABLE]
where the final state is assumed to be a feasible equilibrium for the unforced system (1)
[TABLE]
Next we introduce the definition of the convex safe set and of the terminal cost. Both will be used later to guarantee stability and feasibility of the learning MPC for linear system.
2.1 Convex Safe Set
In the following we recall the definition of the sampled Safe Set from Rosolia and Borrelli (2017) which is necessary to construct the convex Safe Set used in the learning MPC for linear system.
The definition of the sampled Safe Set exploits the iterative nature of the control task to define an invariant control set, using the realized system trajectories. At the -th iteration the sampled safe set, , is defined as
[TABLE]
is the collection of all state trajectories at iteration for . in equation (8) is the set of indexes associated with successful iterations for , defined as:
[TABLE]
Moreover, as and are convex, for each convex combination of the elements in we can find a control sequence that steers the system (1) to . Therefore, the convex Safe Set, defined as
[TABLE]
is a control invariant set. Note that is the cardinality of . For further details on control invariant set we refer to Borrelli (2003).
From (9) we have that , which implies that
[TABLE]
2.2 Terminal Cost
At time of the -th iteration the cost-to-go associated with the closed loop trajectory (3b) and input sequence (3a) is defined as
[TABLE]
where is the stage cost of problem (5). We define the -th iteration cost as the cost (12) of the -th trajectory at time ,
[TABLE]
quantifies the controller performance at each -th iteration.
Remark 1
In equations (12)-(13), and are the realized state and input at the -th iteration, as defined in (3).
Finally we define the, barycentric function (Jones and Morari (2010))
[TABLE]
where
[TABLE]
where is the realized state at time of the -th iteration, as defined in (3b).
Remark 2
The function assigns to every point in the minimum cost-to-go along the trajectories in , in particular we have that
[TABLE]
*where is the minimizer in (15).
Remark 3
In practical applications each -th iteration has a finite time duration , and therefore is reformulated as
[TABLE]
3 LMPC FOR LINEAR SYSTEM
In this section we present the design of the proposed Learning Model Predictive Control (LMPC). We first assume that there exists a feasible input sequence that steers the system from the initial point to terminal point at the [math]-th iteration. Then we prove that the proposed LMPC is guaranteed to be recursively feasible, i.e. feasible at all time instants of every successive iteration. Moreover, we show that the LMPC guaranties a non-increasing iterations cost between two successive executions of the task.
3.1 LMPC Control Design
The LMPC tries to compute a solution to the infinite time optimal control problem (5) by solving at time of iteration the finite time constrained optimal control problem
[TABLE]
where (18b) and (18c) represent the system dynamics and initial condition, respectively. The state and input constraints are given by (18d). Finally (18e) forces the terminal state into the set defined in equation (10).
Let
[TABLE]
be the optimal solution of (18) at time of the -th iteration and the corresponding optimal cost. Then, at time of the iteration , the first element of is applied to the system (1)
[TABLE]
The finite time optimal control problem (18) is repeated at time , based on the new state (18c), yielding a moving or receding horizon control strategy.
Remark 4
Problem (18) is a convex optimization problem as the terminal constraint (18a) enforces the terminal state in the convex set and the terminal cost in (18e) is a convex function. This new formulation of the LMPC (18), (20) as a convex problem is the main contribution of this work compared to Rosolia and Borrelli (2017).
Assumption 1
At iteration we assume that is a non-empty set and that the trajectory is feasible and convergent to .
In the next section we prove that, under Assumption 1, the LMPC (18) and (20) in closed loop with system (1) guarantees recursively feasibility and stability, and non-increase of the iteration cost at each iteration.
3.2 Recursive feasibility and stability
In this Section, the properties of and are used to show recursive feasibility and asymptotic stability of the equilibrium point .
Theorem 1
Consider system (1) controlled by the LMPC controller (18) and (20). Let be the convex safe set at iteration as defined in (10). Let assumption 1 hold, then the LMPC (18) and (20) is feasible and iteration . Moreover, the equilibrium point is asymptotically stable for the closed loop system (1) and (20) at every iteration .
Proof: The proof follows from standard MPC arguments. By assumption is non empty. From (11) we have that , and consequently is a non empty set. In particular, there exists a trajectory . From (4) we know that . At time of the -th iteration the steps trajectory
[TABLE]
and the related input sequence,
[TABLE]
satisfy input and state constrains (18b)-(18c)-(18d). Therefore (21)-(22) is a feasible solution to the LMPC (18) and (20) at of the -th iteration.
Assume that at time of the -th iteration the LMPC (18) and (20) is feasible and let and be the optimal trajectory and input sequence, as defined in (19). From (18c) and (20) the realized state and input at time of the -th iteration are given by
[TABLE]
Moreover, the terminal constraint (18e) enforces and, from (15) and (18a),
[TABLE]
We define
[TABLE]
and
[TABLE]
Since the state update in (1) and (18b) are assumed identical we have that
[TABLE]
At time of the -th iteration the input sequence and the related feasible state trajectory
[TABLE]
satisfy input and state constrains (18b)-(18c)-(18d). Therefore, (28) is a feasible solution for the LMPC (18) and (20) at time .
We showed that at the -th iteration, , (i): the LMPC is feasible at time and (ii): if the LMPC is feasible at time , then the LMPC is feasible at time . Thus, we conclude by induction that the LMPC in (18) and (20) is feasible and .
Next we use the fact the Problem (18) is time-invariant at each iteration and we replace with . In order to show the asymptotic stability of we have to show that the optimal cost, , is a Lyapunov function for the equilibrium point (7) of the closed loop system (1) and (20) Borrelli (2003). Continuity of can be shown as in Mayne et al. (2000). Moreover from (5a), and . Thus, we need to show that is decreasing along the closed loop trajectory.
From (27) we have , which implies that
[TABLE]
Given the optimal input sequence and the related optimal trajectory in (19) and the definition of the (16), the optimal cost is given by
[TABLE]
We can further simplify the above expression using (15c), (24)-(26) and the fact that is jointly convex in the arguments,
[TABLE]
Note that, in the above derivation, we used the fact that and is a feasible solution to problem (15) and therefore is a upper bound for . Finally, from equations (20), (23) and (29)-(31) we conclude that the optimal cost is a decreasing Lyapunov function along the closed loop trajectory,
[TABLE]
Equation (32), the positive definitiveness of and the continuity of imply that is asymptotically stable.
3.3 Convergence properties
In this Section we assume that the LMPC (18) and (20) converges to a steady state trajectory. We show two results. First, the -th iteration cost does not worsen as increases. Second, the steady state trajectory is the solution to the infinite horizon control problem (5).
Theorem 2
Consider system (1) in closed loop with the LMPC controller (18) and (20). Let be the convex safe set at the -th iteration as defined in (10). Let assumption 1 hold, then the iteration cost does not increase with the iteration index .
Proof: Follows from Theorem 2 in Rosolia and Borrelli (2017)
Theorem 3
Consider system (1) in closed loop with the LMPC controller (18) and (20) with . Let be the convex safe set at the -th iteration as defined in (10). Let assumption 1 hold and assume that the closed loop system (1) and (20) converges to a steady state trajectory , for iteration . Then, the steady state input and the related steady state trajectory is a global optimal solution for the infinite horizon optimal control problem (5), i.e., and .
Theorem 4
Consider system (1) in closed loop with the LMPC controller (18) and (20) with . Let be the sampled safe set at the th iteration as defined in (10). Let assumption 1 hold and assume that the closed loop system (1) and (20) converges to a steady state trajectory , for iteration . Denote as the interior of the set , and recall the definition of ones-step predecessor and successor sets from (Rosolia and Borrelli, 2017, Section II). If and for all . Then, the steady state input and the related steady state trajectory is a global optimal solution for the infinite horizon optimal control problem (5), i.e., and .
Proof: Follows from the convexity of (18), (20) and of Problem (5) and Theorem 3 in Rosolia and Borrelli (2017).
4 Example: Constrained LQR controller
In this section, we test the proposed LMPC for linear system on the following infinite horizon linear quadratic regulator with constraints (CLQR)
[TABLE]
In Rosolia and Borrelli (2017) we showed that the LMPC converges to the solution of the infinite horizon control problem (33), whenever we have can compute a feasible trajectory . However, the LMPC in Rosolia and Borrelli (2017) is implemented using the sampled Safe Set (8) as a terminal constraint, instead of the proposed convex Safe Set (10). Therefore, also for linear systems, the LMPC presented in Rosolia and Borrelli (2017) involves the solution of a Mixed Integer Programming (MIP) program which is computationally expensive. In the following, we show that the proposed convex formulation of the LMPC for linear systems reduces the computational burden by several order of magnitude and it converges to the solution of the infinite horizon control problem (33).
The LMPC (18), (20) is implemented with the quadratic running cost , an horizon of steps, and the states and input constraints (33d)-(33e). The LMPC (18) and (20) is reformulated as a Quadratic Programming and it is implemented in YALMIP (Lofberg (2004)) using the solver quadprog. In order to implement the terminal cost (17) we defined the time at which the iterations is completed,
[TABLE]
with .
For and , the LMPC converges a to steady state solution , after iterations, with a error of :
[TABLE]
Table 1 shows the evolution of the iterations cost. We notice that accordingly with Theorem 2 the cost is non-increasing over the iterations.
Furthermore, the solution of the LMPC for linear system is compared with the exact solution of the CLQR (33), which is computed using the algorithm in Borrelli (2003). Given the optimal solution to the infinite horizon optimal control problem (33),
[TABLE]
we define the approximation error as
[TABLE]
quantifies, at each time step , the distance between the optimal trajectory of the CLQR (33) and steady state trajectory at of the LMPC (18) and (20). The maximum approximation error is
[TABLE]
Moreover, the -norm of the normalized difference between the exact optimal cost and the cost associated with the steady state trajectory is
[TABLE]
The LMPC for linear system (18) and (20) has converged to global optimal solution.
We tested the LMPC (18), (20) with different initial conditions and horizon length to experimentally validate Theorems 1-3. Table 2 shows the maximum approximation error, , and . We underline that for all the tested scenarios, regardless of the horizon length, the proposed LMPC converged to the global optimal solution of the infinite horizon control problem. It is interesting to notice that the LMPC (18), (20) with a longer horizon has more freedom to explore the state space and therefore it converges faster to the steady state trajectory.
Finally, we compare the computational burden associate with the LMPC (18), (20) and with the LMPC in Rosolia and Borrelli (2017). The proposed LMPC (18), (20) applied to Problem (33) converged in to a steady state trajectory. On the other hand, the LMPC in Rosolia and Borrelli (2017) applied to Problem (33) took to reach convergence. Therefore, we conclude that the proposed approach significantly reduces the computational burden of the control logic preserving the properties of the LMPC.
5 Conclusions
In this paper, an extension to the learning Model Predictive Control (LMPC) is presented. The controller is designed for linear system and it significantly reduces the computational burden associated with the LMPC. A convex safe set and a terminal cost, learnt from previous iterations, allow to guarantee the recursive feasibility and stability of the closed loop system. Furthermore, the LMPC is guaranteed to improve the performance of the close-loop system over the iterations. We tested the proposed control logic on an infinite horizon linear quadratic regulator with constraints (CLQR) to show that the proposed control logic converges to the optimal solution of the infinite optimal control problem. Finally, we compared the computation time of the proposed strategy with the computational time of the LMPC for nonlinear system, and we showed that the proposed control logic reduces the computational burden by several order of magnitudes.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Borrelli (2003) Borrelli, F. (2003). Constrained optimal control of linear and hybrid systems , volume 290. Springer.
- 2Bristow et al. (2006) Bristow, D.A., Tharayil, M., and Alleyne, A.G. (2006). A survey of iterative learning control. IEEE Control Systems , 26(3), 96–114.
- 3Garcia et al. (1989) Garcia, C.E., Prett, D.M., and Morari, M. (1989). Model predictive control: theory and practice-a survey. Automatica , 25(3), 335–348.
- 4Jones and Morari (2010) Jones, C.N. and Morari, M. (2010). Polytopic approximation of explicit model predictive controllers. IEEE Transactions on Automatic Control , 55(11), 2542–2553.
- 5Lee and Lee (2007) Lee, J.H. and Lee, K.S. (2007). Iterative learning control applied to batch processes: An overview. Control Engineering Practice , 15(10), 1306–1318.
- 6Lee et al. (2000) Lee, J.H., Lee, K.S., and Kim, W.C. (2000). Model-based iterative learning control with a quadratic criterion for time-varying linear systems. Automatica , 36(5), 641–657.
- 7Lee and Lee (2000) Lee, K.S. and Lee, J.H. (2000). Convergence of constrained model-based predictive control for batch processes. IEEE Transactions on Automatic Control , 45(10), 1928–1932.
- 8Lofberg (2004) Lofberg, J. (2004). Yalmip: A toolbox for modeling and optimization in matlab. In Computer Aided Control Systems Design, 2004 IEEE International Symposium on , 284–289. IEEE.
