Performance-oriented model learning for data-driven MPC design

Dario Piga; Marco Forgione; Simone Formentin; Alberto Bemporad

arXiv:1904.10839·math.OC·May 6, 2019·IEEE Control. Syst. Lett.

Performance-oriented model learning for data-driven MPC design

Dario Piga, Marco Forgione, Simone Formentin, Alberto Bemporad

PDF

TL;DR

This paper introduces a novel data-driven approach to optimize model learning for MPC, focusing on enhancing closed-loop performance by selecting the best prediction model through Bayesian optimization.

Contribution

It applies the 'identification for control' concept to hierarchical MPC using Bayesian optimization, a first in this context, to improve control performance.

Findings

01

Enhanced closed-loop performance with data-driven model selection

02

Successful application of Bayesian optimization in hierarchical MPC

03

Improved robustness without conservative assumptions

Abstract

Model Predictive Control (MPC) is an enabling technology in applications requiring controlling physical processes in an optimized way under constraints on inputs and outputs. However, in MPC closed-loop performance is pushed to the limits only if the plant under control is accurately modeled; otherwise, robust architectures need to be employed, at the price of reduced performance due to worst-case conservative assumptions. In this paper, instead of adapting the controller to handle uncertainty, we adapt the learning procedure so that the prediction model is selected to provide the best closed-loop performance. More specifically, we apply for the first time the above "identification for control" rationale to hierarchical MPC using data-driven methods and Bayesian optimization.

Figures9

Click any figure to enlarge with its caption.

Equations67

u_{min}

u_{min}

Δ u_{min}

y_{min}

C \in C min

C \in C min

h_{1} (t) = u (t) - u_{min}, h_{2} (t) = u_{max} - u (t),

h_{1} (t) = u (t) - u_{min}, h_{2} (t) = u_{max} - u (t),

h_{3} (t) = u (t) - u (t - 1) - Δ u_{min} \geq 0,

h_{4} (t) = Δ u_{max} - u (t) + u (t - 1) \geq 0,

h_{5} (t) = y (t) - y_{min}, h_{6} (t) = y_{max} - y (t),

C \in C min

C \in C min

\tilde{J} (y_{1 : T}, u_{1 : T}) = J (y_{1 : T}, u_{1 : T}) + t = 1 \sum T i = 1 \sum 6 b_{t} (h_{i} (t))

\tilde{J} (y_{1 : T}, u_{1 : T}) = J (y_{1 : T}, u_{1 : T}) + t = 1 \sum T i = 1 \sum 6 b_{t} (h_{i} (t))

K (z, θ) = θ_{P} + θ_{I} T_{s} \frac{1}{z - 1} + θ_{D} \frac{N _{d}}{1 + N _{d} T _{s} \frac{1}{z - 1}},

K (z, θ) = θ_{P} + θ_{I} T_{s} \frac{1}{z - 1} + θ_{D} \frac{N _{d}}{1 + N _{d} T _{s} \frac{1}{z - 1}},

\left\{\begin{array}[]{rcl}\xi(t+1)&=&A_{M}\xi(t)+B_{M}g(t)\\ \left[\begin{array}[]{c}y(t)\\ u(t)\end{array}\right]&=&C_{M}\xi(t)+D_{M}g(t),\end{array}\right.

\left\{\begin{array}[]{rcl}\xi(t+1)&=&A_{M}\xi(t)+B_{M}g(t)\\ \left[\begin{array}[]{c}y(t)\\ u(t)\end{array}\right]&=&C_{M}\xi(t)+D_{M}g(t),\end{array}\right.

{g (t + k ∣ t)}_{k = 1}^{N_{u}}, ϵ min Q_{y} k = 1 \sum N_{p} (y (t + k ∣ t) - r (t + k))^{2} +

{g (t + k ∣ t)}_{k = 1}^{N_{u}}, ϵ min Q_{y} k = 1 \sum N_{p} (y (t + k ∣ t) - r (t + k))^{2} +

+ Q_{u} k = 1 \sum N_{p} (u (t + k ∣ t) - u_{ref} (t + k))^{2} +

+ Q_{Δ u} k = 1 \sum N_{p} (u (t + k ∣ k) - u (t + k - 1∣ t))^{2} + Q_{ϵ} ϵ^{2}

\displaystyle\mathrm{s.t.\ }\left[\begin{array}[]{c}y(t+k|t)\\ u(t+k|t)\end{array}\right]=M\left(\mu,g(t+k|t)\right),\ \ k=1,\ldots,N_{p}

y_{min} - V_{y} ϵ \leq y (t + k ∣ t) \leq y_{max} + V_{y} ϵ, k = 1, \dots, N_{p}

u_{min} - V_{u} ϵ \leq u (t + k ∣ t) \leq u_{max} + V_{u} ϵ, k = 1, \dots, N_{p}

Δ u_{min} - V_{Δ u} ϵ \leq Δ u (t + k ∣ t), k = 1, \dots, N_{p}

Δ u (t + k ∣ t) \leq Δ u_{max} + V_{Δ u} ϵ, k = 1, \dots, N_{p}

g (t + N_{u} + j ∣ t) = g (t + N_{u} ∣ t), j = 1, \dots, N_{p} - N_{u}

θ, ν min

θ, ν min

D \leftarrow {(θ_{1 : N_{in}}, ν_{1 : N_{in}}), \tilde{J}_{1 : N_{in}}};

D \leftarrow {(θ_{1 : N_{in}}, ν_{1 : N_{in}}), \tilde{J}_{1 : N_{in}}};

θ_{i + 1}, ν_{i + 1} \leftarrow ar g θ, ν max α (θ, ν ∣ D);

θ_{i + 1}, ν_{i + 1} \leftarrow ar g θ, ν max α (θ, ν ∣ D);

i^{⋆} = ar g i max \tilde{J}_{i};

i^{⋆} = ar g i max \tilde{J}_{i};

m_{i} (θ^{⋆}, ν^{⋆}) =

m_{i} (θ^{⋆}, ν^{⋆}) =

σ_{i}^{2} (θ^{⋆}, ν^{⋆}) =

-

κ ((θ, ν), (\tilde{θ}, \tilde{ν})) = σ_{0}^{2} e^{- \frac{1}{2 λ ^{2}} [θ^{'} - \tilde{θ}^{'} μ^{'} - \tilde{μ}^{'}] [θ^{'} - \tilde{θ}^{'} μ^{'} - \tilde{μ}^{'}]^{'}} .

κ ((θ, ν), (\tilde{θ}, \tilde{ν})) = σ_{0}^{2} e^{- \frac{1}{2 λ ^{2}} [θ^{'} - \tilde{θ}^{'} μ^{'} - \tilde{μ}^{'}] [θ^{'} - \tilde{θ}^{'} μ^{'} - \tilde{μ}^{'}]^{'}} .

lo g p (\tilde{J}_{1 : i} ∣ θ_{1 : i}, ν_{1 : i}, σ_{0}, λ, σ_{e}) \propto

lo g p (\tilde{J}_{1 : i} ∣ θ_{1 : i}, ν_{1 : i}, σ_{0}, λ, σ_{e}) \propto

\propto

α (θ, ν ∣ D) = EI (θ, ν) = E [max {0, \tilde{J}^{-} - \tilde{J} (θ, ν)}],

α (θ, ν ∣ D) = EI (θ, ν) = E [max {0, \tilde{J}^{-} - \tilde{J} (θ, ν)}],

\tilde{J}^{-} = j = 1, \dots, i min \tilde{J} (y_{1 : T}, u_{1 : T}; θ_{j}, ν_{j}) .

\tilde{J}^{-} = j = 1, \dots, i min \tilde{J} (y_{1 : T}, u_{1 : T}; θ_{j}, ν_{j}) .

EI (θ, ν) = (\tilde{J}^{-} - m_{i} (θ, ν)) Ψ (Z) + σ_{i} (θ, ν) ψ (Z)

EI (θ, ν) = (\tilde{J}^{-} - m_{i} (θ, ν)) Ψ (Z) + σ_{i} (θ, ν) ψ (Z)

Z = \frac{J ~ ^{-} - m _{i} ( θ , ν )}{σ _{i} ( θ , ν )},

Z = \frac{J ~ ^{-} - m _{i} ( θ , ν )}{σ _{i} ( θ , ν )},

\left[\begin{array}[]{c}u\\ y\end{array}\right]=\underbrace{\left[\begin{array}[]{c}K(\theta)(I-M_{y}(\mu_{y}))\\ M_{y}(\mu_{y})\end{array}\right]}_{M(\mu_{y},\theta)}g.

\left[\begin{array}[]{c}u\\ y\end{array}\right]=\underbrace{\left[\begin{array}[]{c}K(\theta)(I-M_{y}(\mu_{y}))\\ M_{y}(\mu_{y})\end{array}\right]}_{M(\mu_{y},\theta)}g.

(M + m) \overset{p}{¨} + m L \ddot{ϕ} cos ϕ - m L \dot{ϕ}^{2} sin ϕ + b \overset{p}{˙}

(M + m) \overset{p}{¨} + m L \ddot{ϕ} cos ϕ - m L \dot{ϕ}^{2} sin ϕ + b \overset{p}{˙}

L \ddot{ϕ} + \overset{p}{¨} cos ϕ - g sin ϕ + f_{ϕ} \dot{ϕ}

u = [0 K_{P I} (z, θ)] (g - y),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Performance-oriented model learning for data-driven MPC design

Dario Piga, Marco Forgione, Simone Formentin, Alberto Bemporad This work was partially supported by the H2020-723248 project DAEDALUS - Distributed control and simulation platform to support an ecosystem of digital automation developers and by the Lombardia region and the Cariplo foundation, under the project Learning to Control (L2C), no. 2017-1520.D. Piga and M. Forgione are with IDSIA Dalle Molle Institute for Artificial Intelligence SUPSI-USI, Manno, Switzerland. [email protected]; [email protected]. Formentin is with the Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy. [email protected]. Bemporad is with IMT School for Advanced Studies Lucca, Lucca, Italy [email protected]

Abstract

Model Predictive Control (MPC) is an enabling technology in applications requiring controlling physical processes in an optimized way under constraints on inputs and outputs. However, in MPC closed-loop performance is pushed to the limits only if the plant under control is accurately modeled; otherwise, robust architectures need to be employed, at the price of reduced performance due to worst-case conservative assumptions. In this paper, instead of adapting the controller to handle uncertainty, we adapt the learning procedure so that the prediction model is selected to provide the best closed-loop performance. More specifically, we apply for the first time the above “identification for control” rationale to hierarchical MPC using data-driven methods and Bayesian optimization.

Index Terms:

Predictive control for nonlinear systems, Identification for control, Machine learning.

I Introduction

Nowadays, Model Predictive Control (MPC) has become the most popular advanced control technology for several complex engineering applications [1]. Apart from computational aspects, it is widely recognized that one key practical challenge in MPC arises when dealing with uncertainty, especially when the prediction model is identified using open-loop data taken from a specific operation of the plant [2].

In case of partially known systems, traditional MPC approaches exhibit some degree of robustness, so that marginal robust performance can be guaranteed. When intrinsic robustness of deterministic MPC is not enough, robust MPC [3] and stochastic MPC [2] approaches have been developed to take into account uncertainties. However, regardless of the specific technique, increasing robustness of the MPC controller usually leads to conservative performance [4].

While there is usually a separation between model identification and control design, an alternative approach to managing uncertainty in designing control systems is to revisit the identification process as a procedure to be designed by bearing the final control application in mind. Such a rationale is known as Identification for Control (I4C) and has been widely studied for fixed-order (oftentimes, PID) control of Linear Time-Invariant (LTI) systems [5]. According to I4C, the best model for control may not be the one providing the least output prediction errors, but the one providing the best performance on the true system when in closed loop with the associated model-based controller.

As far as we are aware of, the I4C modeling approach has never been applied to MPC control. Learning techniques have instead been shown to be useful for iterative MPC tasks in [6] and in reinforcement learning applied to MPC [7]. Furthermore, data-driven approaches have been proposed for direct MPC optimization using open- and closed-loop data, see, e.g., [8, 9]. Although the above approaches are powerful tools for control design in case of unknown systems, they fail to provide a mathematical (albeit control-oriented) description of the plant. Indeed, the latter can often be useful for physical interpretation, performance monitoring, and diagnosis [10].

In this work, we propose an Identification for (Model-Predictive) Control approach aimed at finding the best prediction model for MPC from experimental data, by considering the control objective directly in the model learning phase. We propose a hierarchical architecture, typically employed in several industrial applications, in which the inner controller is a parametric filter (e.g., a PID controller) aimed to stabilize the system at a fast pace, whereas the outer loop plays the role of a reference governor (RG) [11, 12] with a twofold goal: ( $i$ ) boosting the performance of the inner loop and ( $ii$ ) handling the signals constraints due, e.g., to actuator bounds or system limitations. Within this framework, the RG is typically an MPC law based on a model of the inner loop. According to the I4C philosophy, we propose a change of perspective and treat such a model as a design parameter instead. Such a parameter will be iteratively optimized, together with the inner controller, using closed-loop data collected on the plant and Bayesian optimization. Finally, we show that, using the same rationale and tools, also the prediction horizon, a critical parameter to tune in MPC, can be optimized from data.

For the sake of completeness, the first use of Bayesian optimization in control-oriented identification was proposed in [13], based on a simpler control scheme. The same hierarchical architecture was instead addressed in [9] to design the controller from data, but without providing an MPC-oriented model of the plant.

The remainder of the paper is as follows. In Section II the control problem of interest is formally stated. The hierarchical architecture is introduced in Section III, where also the parameterization of each block is described (and motivated) in detail. The proposed strategy is described in Section IV, where a discussion on how to practically restrict the parameter space is also provided. Section V illustrates the performance of the method on a benchmark example.

II Problem formulation

Consider a multi-input multi-output (MIMO) plant $\mathcal{S}$ , with input $u\in\mathbb{R}^{n_{u}}$ and output $y\in\mathbb{R}^{n_{y}}$ signals sampled at a regular time interval $T_{s}$ . We aim at synthesizing a controller $\mathcal{C}$ for $\mathcal{S}$ such that the controlled closed-loop system achieves a desired engineering objective defined in terms of minimization of a cost $J(y_{1:T},u_{1:T})$ , where $y_{1:T}$ (resp. $u_{1:T}$ ) denotes the sequence of output (resp. input) signals measured at time steps $t=1,\ldots,T$ , and $T$ is the length (measured in number of samples) of the experiment where the closed-loop performance is measured. Besides minimizing the cost $J(y_{1:T},u_{1:T})$ , the following constraints on inputs and outputs should be satisfied:

[TABLE]

Constraints (1) are generally imposed by actuator limitations or might reflect safety conditions. The control design problem is formulated as the following optimization problem:

[TABLE]

with $\mathcal{\mathbf{C}}$ denoting the set of controller candidates.

We rewrite constraints (1) as $h_{i}(t)\geq 0$ , $i=1,\ldots,6$ , with

[TABLE]

and treat them with penalty functions

[TABLE]

where

[TABLE]

and $b_{t}:\mathbb{R}\to\mathbb{R}$ are (possibly time-varying) barrier functions.

Assuming zero initial conditions, clearly $y_{1:T}$ , $u_{1:T}$ in (5) are only functions of the controller $\mathcal{C}$ and of the process model $\mathcal{S}$ . Rather than first fixing a model for $\mathcal{S}$ (either from first-principle physical laws or using system identification techniques), we follow a performance-driven control design paradigm and leave the model of $\mathcal{S}$ as a degree of freedom, used to minimize the closed-loop cost $\tilde{J}(y_{1:T},u_{1:T})$ .

III Control architecture

We adopt the hierarchical, multi-rate, reference-governor control architecture in Fig. 1, consisting of:

•

an inner low-level controller $\mathcal{K}$ which operates at sampling time $T_{s}$ and it is mainly used to handle fast dynamics of the system. This controller introduces a degree-of-freedom in the control design and, in case of unstable plants $\mathcal{S}$ , it might also stabilize the inner closed-loop system $\mathcal{M}$ . Nevertheless, the latter is not a required condition in our design approach.

•

an outer MPC to enhance performance of the inner loop $\mathcal{M}$ an to enforce constraints (1c). The MPC operates at a sampling time $T_{\text{MPC}}$ that is an integer multiple of $T_{s}$ , i.e., $T_{\text{MPC}}=NT_{s}$ with $N\in\mathbb{N}$ . Setting $T_{\text{MPC}}$ larger than $T_{s}$ (thus, $N>1$ ) may be needed to solve the constrained optimization problem on line, i.e., within the MPC sampling time $T_{\text{MPC}}$ .

In standard RG approaches, the outer MPC requires a prediction model of the inner loop $\mathcal{M}$ . In accordance with the performance-driven approach proposed in this paper, we treat such a model as a design parameter and look for the model providing the best closed-loop performance according to the performance index $\tilde{J}(y_{1:T},u_{1:T})$ . In particular, as detailed in the following, a model of the plant $\mathcal{S}$ will be used neither to design the controllers nor to evaluate the performance index $\tilde{J}(y_{1:T},u_{1:T})$ , which will be instead measured directly from closed-loop experiments performed on the actual plant.

III-A Inner controller parameterization

The inner controller $\mathcal{K}$ is parameterized by a vector $\theta\in\mathbb{R}^{n_{\theta}}$ . For instance, $\mathcal{K}$ can be a simple discrete-time proportional-integral-derivative (PID) controller, with sampling time $T_{s}$ and discrete-time transfer function

[TABLE]

where $\theta=\left[\theta_{P}\ \ \theta_{I}\ \ \theta_{D}\right]^{\prime}$ is the design parameter vector and $N_{d}\gg 1$ limits the high-frequency gain of the PID controller. Although $N_{d}$ may be treated as a design parameter, its tuning is generally not critical and thus not included in $\theta$ .

III-B Outer MPC parameterization

The most important component of the outer MPC is the model used to predict the output $y$ and input $u$ as a function of the MPC command $g$ . Let $M$ be the dynamical model from $g$ to $\left[\begin{smallmatrix}y\\ u\end{smallmatrix}\right]$ , described in the state-space representation

[TABLE]

where $\xi\in\mathcal{R}^{n_{\xi}}$ is the state of the closed-loop model. For instance, in the case of a single-input-single-output plant, the $2\times 1$ transfer matrix $M$ can be modelled as a pair of transfer functions with the same poles. Let $\mu\in\mathbb{R}^{n_{\mu}}$ be the vector obtained by stacking the entries of $A_{M},B_{M},C_{M}$ and $D_{M}$ .

At each time instant $t$ integer multiple of the MPC sampling time $T_{\text{MPC}}$ (i.e., $t=hT_{\text{MPC}}$ with $h\in\mathbb{N}$ ), the outer MPC solves the minimization problem

[TABLE]

where $\Delta u(t+k|t)=u(t+k|t)-u(t+k-1|t)$ , $N_{p}$ and $N_{u}$ are the prediction and control horizon, respectively, $Q_{y}$ , $Q_{u}$ , $Q_{\Delta u}$ , $Q_{\epsilon}$ are nonnegative weights, $u_{\rm ref}$ and $r$ are the input and output references, respectively, $V_{y}$ , $V_{u}$ , $V_{\Delta u}$ are positive vectors that are used to soften the constraints on plant’s input and output, so that (8) always admits a solution. According to standard MPC design, in case $N_{u}<N_{p}$ , the constraint (8i) enforces a constant value of $g$ from time $N_{u}$ to $N_{p}$ . The reader is referred to [1] for an overview on MPC design.

We can also treat the prediction horizon $N_{p}$ as a design parameter, and denote by $\nu=\left[\mu^{\prime}\ \ N_{p}\right]^{\prime}$ , $\nu\in\mathbb{R}^{n_{\mu}}\times\mathbb{N}$ the overall vector of tuning parameters. The control horizon $N_{u}$ determines, together with the number of constraints in (8), the computational complexity of the outer MPC controller. Therefore, it is usually fixed by the available online throughput. Alternatively, we can set $N_{u}=N_{p}$ .

The remaining MPC parameters ( $N_{u}$ , $Q_{y}$ , $Q_{u}$ , $Q_{\Delta u}$ , $Q_{\epsilon}$ , $V_{y}$ , $V_{u}$ and $V_{\Delta u}$ ) are treated as a specification of the desired closed-loop performance, and therefore not optimized. More generally, we could decouple the MPC quadratic cost in (8) from the closed-loop performance index $\tilde{J}(y_{1:T},u_{1:T})$ . For instance, $\tilde{J}(y_{1:T},u_{1:T})$ can be a general, possibly non-convex function reflecting engineering or economic goals, while the cost of the MPC (8) is quadratic to facilitate online optimization. Indeed, in case the augmented model $M$ is LTI as in (7), problem (8) reduces to a quadratic programming (QP) problem whose solution can be computed both offline using multiparametric quadratic programming [14] or online using dedicated QP solvers based, e.g., on interior-point algorithms [15], fast gradient projection [16], or active set methods [17].

IV Performance-driven parameter tuning

Based on the controller parametrization introduced in the previous section, the closed-loop performance cost $\tilde{J}(y_{1:T},u_{1:T})$ is a function of vectors $\theta$ and $\nu$ parametrizing the inner controller $\mathcal{K}$ and the outer MPC, respectively. Thus, under the hierarchical architecture of Fig. 1, the original control design problem (4) is equivalent to

[TABLE]

IV-A Bayesian optimization for parameter selection

The design problem (9) is solved through the Bayesian optimization (BO) strategy [18] outlined in Algorithm 1. The algorithm is initialized (step 0.) by performing $N_{\text{in}}\geq 1$ closed-loop experiments for $N_{\text{in}}$ different (e.g., randomly chosen) values of controller parameters $\theta_{i}$ and $\nu_{i}$ (with $i=1,\ldots,N_{\text{in}}$ ). For each pair $(\theta_{i},\nu_{i})$ , a closed-loop experiment is performed and the performance index $\tilde{J}_{i}$ is measured. In this way, an initial set $\mathcal{D}=\{(\theta_{1:N_{\text{in}}},\nu_{1:N_{\text{in}}}),\tilde{J}_{1:N_{\text{in}}}\}$ of parameters and corresponding performance $\tilde{J}$ is constructed, with $\theta_{1:N_{\text{in}}}$ denoting the sequence $\theta_{i}$ for $i=1,\ldots,N_{\text{in}}$ . In practice, the experiment can be interrupted and large cost assigned to $\tilde{J}_{i}$ in case of safety constraint violations.

The algorithm is then iterated until a stopping criterion is met (e.g., maximum number of iterations reached). At each iteration $i\geq N_{\text{in}}$ , the following two steps are performed:

•

Learning phase (Step 0.0..0..0.). In this step, a Gaussian Process (GP) describing our “best guess” of the cost $\tilde{J}(y_{1:T},u_{1:T};\theta,\nu)$ corresponding to the design parameters $\theta$ and $\nu$ is fitted on the available data $\mathcal{D}$ .

Under the prior assumption that the cost $\tilde{J}$ is generated by a GP with zero mean and covariance function $\kappa((\theta,\nu),(\tilde{\theta},\tilde{\nu}))$ , the posterior distribution of $\tilde{J}(y_{1:T},u_{1:T};\theta^{\star},\nu^{\star})$ for generic controller parameters $(\theta^{\star},\nu^{\star})$ can be computed analytically. Specifically, $\tilde{J}(y_{1:T},u_{1:T};\theta^{\star},\nu^{\star})$ is a Gaussian variable with mean

[TABLE]

where the $j$ -th element of the vector $\mathbb{k}_{i}\in\mathbb{R}^{i}$ is $\kappa((\theta^{\star}\!,\!\nu^{\star}),(\theta_{j},\!\nu_{j}))$ ; the $[j,h]$ -th entry of the Kernel matrix $\mathbb{K}_{i}\in\mathbb{R}^{i\times i}$ is $\kappa((\theta_{j},\nu_{j}),(\theta_{h},\nu_{h}))$ ; $\sigma_{e}^{2}$ represents the variance of an additive (Gaussian) noise possibly affecting the observations of the cost $\tilde{J}$ ; and $I$ denotes the identity matrix of proper dimension.

The covariance function $\kappa((\theta,\nu),(\tilde{\theta},\tilde{\nu}))$ for the GP can be defined, for instance, in terms of the so-called Squared Exponential (SE) covariance kernel, defined as

[TABLE]

The hyper-parameters $\sigma_{0}$ and $\lambda$ characterizing the SE kernel, as well as the noise variance $\sigma_{e}^{2}$ , can be chosen by maximizing the log marginal likelihood [19]

[TABLE]

•

Optimization phase (Steps 0.0..0..0.-0.0..0..0.). In this phase, the next design parameters $\theta_{i+1}$ and $\nu_{i+1}$ to test are chosen by maximizing the so-called acquisition function $\alpha(\theta,\nu|\mathcal{D})$ (Step 0.0..0..0.). The acquisition function $\alpha(\theta,\nu|\mathcal{D})$ is constructed based on the mean and covariance (eq. (10)) of the GP estimated in the learning step. The acquisition function balances exploration (i.e., learning more about the objective $\tilde{J}$ in regions of the parameter space with high variance) and exploitation (i.e., search over regions with high mean to optimize the expected performance based on past collected data). Different acquisition functions have been proposed in the literature (see [20] and the references therein for a deep overview). In the example reported in Section V, we use the Expected Improvement (EI) acquisition function, defined as

[TABLE]

where $\tilde{J}^{-}$ represents the best value of objective function at the $i$ -th iteration, i.e.,

[TABLE]

Under the GP framework previously discussed, the EI in (11) can be evaluated analytically and it is equal to:

[TABLE]

if $\sigma_{i}(\theta,\nu)>0$ , [math] otherwise. In (13), $Z$ is defined as

[TABLE]

and $\psi$ and $\Psi$ are the probability density function and the cumulative density function of the standard normal distribution, respectively.

The advantages of using BO for tackling this design problem are twofold. First, it is a derivative-free optimization algorithm, which is useful since a closed-form expression of the performance $\tilde{J}$ as a function of the design parameters $\theta$ and $\nu$ is not available. Second, it allows us to tune the controller parameters with as few evaluations of $\tilde{J}$ as possible. The latter point is crucial, since each evaluation can be costly and time-consuming, as it requires a closed-loop experiment.

IV-B Restricting the parameter space

Bayesian optimization allows setting bounds on the search space of the parameters $\theta$ and $\nu$ . These bounds can be included in the maximization of the acquisition function at Step 2.3 of Algorithm 1. Restricting the search space generally speeds up the algorithm’s convergence, thus requiring fewer evaluations of the functional $\tilde{J}$ . Suitable bounds may be defined exploiting prior system knowledge and design choices. Some applicable restrictions of the parameter space are discussed next.

It may be reasonable to assume that the optimal solution is achieved using an inner controller $\mathcal{K}$ that stabilizes the inner loop $\mathcal{M}$ . Therefore, one may constrain $\mu$ so that the prediction model $M$ used by MPC is asymptotically stable.

Some basic control design rules may be also used to restrict the search space of $\theta$ defining the inner-loop controller $\mathcal{K}$ . For instance, if $\mathcal{K}$ is a PID controller parametrized as in (6), its static gain should have the same sign of the static gain of the (stable) system $\mathcal{S}$ .

Another significant reduction of the parameter space may be achieved under the assumption that the prediction sub-model $M_{y}(\mu_{y})$ used by the MPC accurately describes the system dynamics $\mathcal{M}$ from $g$ to $y$ . In this case, one can simply derive the augmented model $M$ providing the relationship from $g$ to the plant input $u$ and output $y$ as

[TABLE]

Note that in this case $\mu=\left[\begin{smallmatrix}\mu_{y}\\ \theta\end{smallmatrix}\right]$ , that is the prediction model and controller share some parameters.

Other restrictions may be introduced according to the particular problem at hand and prior knowledge available to the user, e.g., diagonal models assuming decoupled dynamics, grey-box models with known intervals for physical parameters, etc.

V Numerical Example

As a case study, we consider the control problem of the inverted pendulum on a cart depicted in Fig. 2.

V-A System description

The dynamics of the process are governed by the equations

[TABLE]

where $p$ is the cart position, $\phi$ is the angle of the pendulum with respect to the upright vertical position, and $F$ is an input force acting on the cart. The following values of the physical parameters are used: $M=0.5~{}\text{Kg}$ (cart mass), $m=0.2~{}\text{Kg}$ (pendulum mass), $L=0.3~{}\text{m}$ (rod length), $g=9.81~{}\text{m/s\textsuperscript{2}}$ (gravitational acceleration), $b=0.1~{}\text{N/m/s}$ , and $f_{\phi}=0.1~{}\text{m/s}$ (friction terms). According to the approach proposed in the paper, no knowledge of the physical model of the process is used in designing the controller, and (16) are only used for data generation and performance evaluation.

The output signals $p$ and $\phi$ are measured every $T_{s}=5\;\text{ms}$ and measurements are corrupted by an additive zero-mean white Gaussian noise with standard deviation $0.01~{}\text{m}$ and $0.01~{}\text{rad}$ , respectively. The input force $F$ is also perturbed by an additive zero-mean random disturbance with standard deviation $1~{}\text{N}$ and bandwidth $10\;\text{rad/s}$ .

In performing closed-loop experiments, the system is initialized at $[p(0)\;\dot{p}(0)\;\phi(0)\;\dot{\phi}(0)]=[0\;0\;\frac{\pi}{20}\;0]$ . The objective is to move the pendulum to the vertical position $\phi=0$ , while limiting the cart displacement. The force $F$ is constrained to belong to the interval $I_{F}=[-20\;20]$ N, while the cart position $p$ should stay within the range $I_{p}=[-1\ 1]$ m (representing, e.g., finite length of the track where the cart moves).

V-B Control design

The hierarchical controller in Fig. 1 is designed, with $y=[p\ \ \phi]$ and $u=F$ . The inner-loop controller $\mathcal{K}$ is

[TABLE]

where $\mathcal{K}_{PI}(z,\theta)$ is a discrete-time transfer function of a PID controller parametrized as in (6), with $\theta=[\theta_{P}\ \theta_{I}\ \theta_{D}]\in\mathbb{R}^{3}$ and $N_{d}=100$ . Note that only the angle $\phi$ is actually fed back in the inner loop, thus the task of the inner controller $\mathcal{K}$ is only to stabilize the dynamics of the angle $\phi$ .

Besides taking care of the control objectives, the outer MPC shall also enforce constraints on the cart position $p$ and on the input force $F$ . The model used by the MPC to predict the dynamics of the inner loop $\mathcal{M}$ from the MPC command $g$ to the plant output $y=[p\ \ \phi]$ (see Fig. 1) is parameterized as the continuous-time state-space model

[TABLE]

where $\xi_{M}\in\mathbb{R}^{2}$ is the state vector and $\mu\in\mathbb{R}^{6}$ contains the entries of $A_{M}\in\mathbb{R}^{2\times 2}$ and the second column of $B_{M}\in\mathbb{R}^{2\times 2}$ . Because of the structure of the inner controller $\mathcal{K}$ in (17), the position $p$ is not fed back to the inner loop. Thus, the first column of $B_{M}$ is set to zero and not included in the design parameter vector $\mu$ . The overall MPC prediction model $M$ is constructed using (15).

The MPC control law is computed solving (8) and applied in a receding-horizon fashion, using a sampled version of (18) with sampling time $T_{\text{MPC}}=10T_{s}=50\;\text{ms}$ , reference $r=[r_{p}\ r_{\phi}]=[0\ \ 0]$ and real-time constraints on $F$ and $p$ based on the admissible intervals $I_{F}$ and $I_{p}$ , respectively.

Regarding the MPC design parameters, the weight matrices are not optimized and set to $Q_{y}=\text{diag}(0.1,0.1)$ , $Q_{u}=0$ , $Q_{\Delta u}=0.1$ and $Q_{\epsilon}=10^{5}$ . The prediction horizon $N_{p}$ is considered as a free parameter to be adjusted in the Bayesian optimization, while $N_{u}$ is set equal to $N_{p}$ . The real-value design parameters $\theta$ and $\mu$ are constrained to belong to the interval $[-500\ \ 500]$ , while the prediction horizon $N_{p}$ can take integer values between $10$ and $20$ . The MPC control law is computed using the MATLAB Model Predictive Control Toolbox. All the computations are carried out on an i5 2.60-GHz Intel core processor with 32 GB of RAM running MATLAB R2018a. The maximum computational time required to evaluate the MPC law over all the performed closed-loop experiments is $21$ ms, thus lower than the sampling time $T_{\text{MPC}}=50$ ms.

Overall, there are $10$ parameters to be designed, namely $\theta\in\mathbb{R}^{3}$ , $\mu\in\mathbb{R}^{6}$ , and $N_{p}\in\mathbb{N}$ . The closed-loop performance cost $\tilde{J}$ to be minimized is defined as

[TABLE]

where

[TABLE]

is the barrier function taking into account violation on the physical constraints on the cart position $p$ . The cost $\tilde{J}$ is evaluated over closed-loop experiments of length $10$ s on the discrete-time samples collected at rate $T_{s}$ . This objective function reflects the engineering objective of controlling the angle $\phi$ to 0, limiting the horizontal displacement and keeping the cart position in the admissible range $I_{p}$ . The constraint on the force $F$ is enforced by a saturation block at the system input, and thus it is not penalized in $\tilde{J}$ .

The design problem (9) is solved using the MATLAB Statistics and Machine Learning Toolbox, setting the EI in (11) as acquisition function. $N_{\text{in}}=10$ random values of the design parameters $\theta$ , $\mu$ and $N_{p}$ are generated to initialize Algorithm 1, which is then executed for $310$ iterations. The complete test code of this paper is available for download at http://www.marcoforgione.it/data/code/CSL2019_perf.zip.

V-C Simulation results

The performance cost $\tilde{J}$ vs the iteration index $i$ of Algorithm 1 is shown in Fig. 3. For each iteration $i$ , the performance of the current test point (black asterisk) and of the current best point up to iteration $i$ (red line) are shown. From Fig. 3, it can be noticed that the optimal controller parameters are found at iteration $123$ (green square). Furthermore, as the iteration index $i$ increases, more and more test points are concentrated in an area of low cost $\tilde{J}$ .

A closed-loop experiment is repeated over a longer period of $20$ s using the designed controller. The time trajectories of the cart’s position $p$ and the pendulum’s angle $\phi$ are plotted in Fig. 4, which shows that the designed controller is able to stabilize the pendulum’ angle in the upright vertical position, respecting the constraints on the cart’s position $p$ .

For the sake of comparison, the following two non-hierarchical model-based controllers are designed based on the physical model of the system (Eq. (16)) linearized around $[p(0)\;\dot{p}(0)\;\phi(0)\;\dot{\phi}(0)]=[0\;0\;0\;0]$ :

•

an MPC, with the same sampling rate $T_{\text{MPC}}=50$ ms considered before, which reflects real-time constraints. At this sampling rate, the MPC is not able to reject the disturbance and thus fails to stabilize the pendulum around the upright vertical position. This shows the advantages of the hierarchical multi-rate controller structure.

•

a Linear-Quadratic-Gaussian (LQG) controller, with sampling rate $T_{s}=5$ ms. This controller stabilizes the pendulum. However, besides requiring a knowledge of the plant, it achieves a performance cost $\tilde{J}$ (eq. (19)) equal to $-2.41$ , which is worse than the cost $\tilde{J}=-3.66$ obtained using the proposed performance-oriented approach.

VI Conclusions and follow-up

In this work, we described a method to learn MPC-oriented models for hierarchical control schemes via iterative closed-loop experiments. We showed that such experiments can be suitably designed using Bayesian optimization. In the proposed learning framework, the model does not necessarily provide the highest input/output data fit, which is the typical objective of system identification, but is the one yielding the model-based controller corresponding to the best closed-loop performance. We also argued that the prediction horizon can be optimized using the same tools and experiments. Numerical simulations on a benchmark example showed that data can lead to satisfactory controllers with no knowledge of the system dynamics and no constraints on modeling accuracy. Future research will be devoted to the theoretical analysis of the proposed learning strategy as well as to its experimental validation on a real-world setup.

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] F. Borrelli, A. Bemporad, and M. Morari, Predictive control for linear and hybrid systems . Cambridge University Press, 2017.
2[2] A. Mesbah, “Stochastic model predictive control: An overview and perspectives for future research,” IEEE Control Systems Magazine , vol. 36, no. 6, pp. 30–44, 2016.
3[3] P. Falugi and D. Q. Mayne, “Getting robustness against unstructured uncertainty: a tube-based mpc approach,” IEEE Transactions on Automatic Control , vol. 59, no. 5, pp. 1290–1295, 2014.
4[4] A. Bemporad and M. Morari, “Robust model predictive control: A survey,” in Robustness in identification and control . Springer, 1999, pp. 207–226.
5[5] M. Gevers, “Identification for control: From the early achievements to the revival of experiment design,” European journal of control , vol. 11, no. 4-5, pp. 335–352, 2005.
6[6] U. Rosolia and F. Borrelli, “Learning model predictive control for iterative tasks. a data-driven control framework,” IEEE Transactions on Automatic Control , vol. 63, no. 7, pp. 1883–1896, 2018.
7[7] M. Zanon, S. Gros, and A. Bemporad, “Practical reinforcement learning of stabilizing economic MPC,” in European Control Conference , 2019, to appear.
8[8] R. Kadali, B. Huang, and A. Rossiter, “A data driven subspace approach to predictive controller design,” Control engineering practice , vol. 11, no. 3, pp. 261–278, 2003.