The Generalized Cross Validation Filter

Giulio Bottegal; Gianluigi Pillonetto

arXiv:1706.02495·stat.ML·June 9, 2017

The Generalized Cross Validation Filter

Giulio Bottegal, Gianluigi Pillonetto

PDF

TL;DR

This paper introduces a novel GCV filter that extends Kalman filtering to enable efficient, real-time updating of the GCV score, facilitating online parameter estimation in inverse problems and regularization.

Contribution

The paper develops a new GCV filter that reduces update complexity to O(1), allowing for online application of GCV in state estimation and system identification.

Findings

01

GCV update cost is reduced to O(1) per time step.

02

The GCV filter extends Kalman filter equations for online use.

03

Applications demonstrated in state estimation and system identification.

Abstract

Generalized cross validation (GCV) is one of the most important approaches used to estimate parameters in the context of inverse problems and regularization techniques. A notable example is the determination of the smoothness parameter in splines. When the data are generated by a state space model, like in the spline case, efficient algorithms are available to evaluate the GCV score with complexity that scales linearly in the data set size. However, these methods are not amenable to on-line applications since they rely on forward and backward recursions. Hence, if the objective has been evaluated at time $t - 1$ and new data arrive at time t, then O(t) operations are needed to update the GCV score. In this paper we instead show that the update cost is $O (1)$ , thus paving the way to the on-line use of GCV. This result is obtained by deriving the novel GCV filter which extends the classical…

Equations105

x_{k + 1}

x_{k + 1}

y_{k}

\hat{Y_{t}} = H_{t} Y_{t} .

\hat{Y_{t}} = H_{t} Y_{t} .

GCV_{t} = \frac{S _{t}}{t ( 1 - δ _{t} / t ) ^{2}},

GCV_{t} = \frac{S _{t}}{t ( 1 - δ _{t} / t ) ^{2}},

S_{t} = ∥ \hat{Y_{t}} - Y_{t} ∥^{2},

S_{t} = ∥ \hat{Y_{t}} - Y_{t} ∥^{2},

δ_{t} = Tr (H_{t}) .

δ_{t} = Tr (H_{t}) .

x_{k + 1}

x_{k + 1}

y_{k}

x_{1}

ω_{k}

e_{k}

A_{k}

A_{k}

[Q_{k}]_{ij}

\overset{x}{^}_{1}

\overset{x}{^}_{1}

P_{1}

δ_{1}

S_{1}

GCV_{1}

K_{k}

K_{k}

G_{k}

\overset{x}{^}_{k + 1}

\hat{ζ}_{k + 1}

P_{k + 1}

Σ_{k + 1}

δ_{k + 1}

S_{k + 1}

+ 2 γ^{2} C_{k + 1} \hat{ζ}_{k + 1} \frac{y _{k + 1} - C _{k + 1} x ^ _{k + 1}}{C _{k + 1} P _{k + 1} C _{k + 1}^{T} + γ}

GCV_{k + 1}

k \to \infty lim P_{k} = \overset{ˉ}{P} \mbox an d k \to \infty lim Σ_{k} = \overset{ˉ}{Σ}

k \to \infty lim P_{k} = \overset{ˉ}{P} \mbox an d k \to \infty lim Σ_{k} = \overset{ˉ}{Σ}

\overset{ˉ}{P} = A \overset{ˉ}{P} A^{T} + Q - A \overset{ˉ}{P} C^{T} (C \overset{ˉ}{P} C^{T} + γ)^{- 1} C \overset{ˉ}{P} A^{T}

\overset{ˉ}{P} = A \overset{ˉ}{P} A^{T} + Q - A \overset{ˉ}{P} C^{T} (C \overset{ˉ}{P} C^{T} + γ)^{- 1} C \overset{ˉ}{P} A^{T}

\overset{ˉ}{Σ} = (A - \overset{ˉ}{K} C) \overset{ˉ}{Σ} (A - \overset{ˉ}{K} C)^{T} + \overset{ˉ}{K} \overset{ˉ}{K}^{T}

\overset{ˉ}{Σ} = (A - \overset{ˉ}{K} C) \overset{ˉ}{Σ} (A - \overset{ˉ}{K} C)^{T} + \overset{ˉ}{K} \overset{ˉ}{K}^{T}

\overset{ˉ}{K} = A \overset{ˉ}{P} C^{T} (C \overset{ˉ}{P} C^{T} + γ)^{- 1}, \overset{ˉ}{G} = \frac{A Σ ˉ A ^{T} - K ˉ ( C Σ ˉ C ^{T} + 1 )}{C P ˉ C ^{T} + γ} .

\overset{ˉ}{K} = A \overset{ˉ}{P} C^{T} (C \overset{ˉ}{P} C^{T} + γ)^{- 1}, \overset{ˉ}{G} = \frac{A Σ ˉ A ^{T} - K ˉ ( C Σ ˉ C ^{T} + 1 )}{C P ˉ C ^{T} + γ} .

k \to \infty lim \frac{δ _{k}}{k},

k \to \infty lim \frac{δ _{k}}{k},

k \to \infty lim \frac{δ _{k}}{k} = 1 - γ \frac{C Σ ˉ C ^{T} + 1}{C P ˉ C ^{T} + γ}

k \to \infty lim \frac{δ _{k}}{k} = 1 - γ \frac{C Σ ˉ C ^{T} + 1}{C P ˉ C ^{T} + γ}

F_{t} = 100% (1 - \frac{∥ Z _{t} - Z ^ _{t} ∥}{∥ Z _{t} - 1 1 Z ˉ _{t} ∥}),

F_{t} = 100% (1 - \frac{∥ Z _{t} - Z ^ _{t} ∥}{∥ Z _{t} - 1 1 Z ˉ _{t} ∥}),

x_{k + 1}

x_{k + 1}

y_{k}

Q=\left(\begin{array}[]{c}11.81\\ 0.625\end{array}\right)\left(\begin{array}[]{c}11.81\ \ 0.625\end{array}\right),\quad\gamma=30.

Q=\left(\begin{array}[]{c}11.81\\ 0.625\end{array}\right)\left(\begin{array}[]{c}11.81\ \ 0.625\end{array}\right),\quad\gamma=30.

\tilde{Q}=Q+\left(\begin{array}[]{cc}0&0\\ 0&100\end{array}\right),

\tilde{Q}=Q+\left(\begin{array}[]{cc}0&0\\ 0&100\end{array}\right),

Y_{t} = Φ_{t} g + E_{t},

Y_{t} = Φ_{t} g + E_{t},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

††thanks: This research has been partially supported by the MIUR FIRB project RBFR12M3AC-Learning meets time: a new computational approach to learning in dynamic systems and by the Progetto di Ateneo CPDA147754/14-New statistical learning approach for multi-agents adaptive estimation and coverage control. This paper was not presented at any IFAC meeting. Corresponding author Gianluigi Pillonetto Ph. +390498277607.

and

The Generalized Cross Validation Filter

Giulio Bottegal

Gianluigi Pillonetto

Department of Electrical Engineering, TU Eindhoven, Eindhoven, The Netherlands (e-mail: [email protected])

Department of Information Engineering, University of Padova, Padova, Italy (e-mail: [email protected])

Abstract

Generalized cross validation (GCV) is one of the most important approaches used to estimate parameters in the context of inverse problems and regularization techniques. A notable example is the determination of the smoothness parameter in splines. When the data are generated by a state space model, like in the spline case, efficient algorithms are available to evaluate the GCV score with complexity that scales linearly in the data set size. However, these methods are not amenable to on-line applications since they rely on forward and backward recursions. Hence, if the objective has been evaluated at time $t-1$ and new data arrive at time $t$ , then $O(t)$ operations are needed to update the GCV score. In this paper we instead show that the update cost is $O(1)$ , thus paving the way to the on-line use of GCV. This result is obtained by deriving the novel GCV filter which extends the classical Kalman filter equations to efficiently propagate the GCV score over time. We also illustrate applications of the new filter in the context of state estimation and on-line regularized linear system identification.

keywords:

Kalman filtering; generalized cross-validation; on-line system identification; inverse problems; regularization; smoothness parameter; splines

1 Introduction

Linear state space models assume the form

[TABLE]

where $x_{k}$ is the state at instant $k$ , $y_{k}$ is the output, while $\omega_{k}$ and $e_{k}$ are random noises. The matrices $A_{k}$ and $C_{k}$ regulate the state transition and the observation model at instant $k$ . This kind of models plays a central role in the analysis and design of discrete-time systems [17]. Applications abound and include tracking, navigation and biomedicine.

In on-line state estimation, the problem is the reconstruction of the values of $x_{k}$ from measurements of $y_{k}$ collected over time. When the matrices $A_{k}$ and $C_{k}$ and the noises covariances are known, the optimal linear estimates are efficiently returned by the classical Kalman filter [1]. However, in many circumstances there can be unknown model parameters that also need to be inferred from data in an on-line manner, e.g. variance components or entries of the transition/observation matrices. One can interpret such parameters as additional states. Then, the extended Kalman filter [16] or more sophysticated stochastic techniques, such as particle filters and Markov chain Monte Carlo [10, 22, 3, 9], can be used to track the filtered posterior. Another technique consists of propagating the marginal likelihood of the unknown parameters via a bank of filters [1, Ch. 10]. In this paper, we will show that another viable alternative is the use of an approach known in the literature as generalized cross validation (GCV) [12].

In the literature of statistics and inverse problems, GCV is widely used in off-line contexts to estimate unknown parameters entering regularized estimators [5, 37, 40]. This approach was initially used to tune the smoothness parameter in ridge regression and smoothing splines [14, 12, 33]. GCV is now also popular in machine learning, used to improve the generalization capability of regularized kernel-based approaches [34, 8], such as regularization networks, which contain spline regression as special case [31, 11].

To introduce GCV in our state space context, we first recall that smoothing splines are closely linked to state space models of $m$ -fold integrated Wiener processes [25]; then it appears natural to extend GCV to general state space models. To this end, assume that measurements $y_{k}$ have been collected up to instant $t$ and stacked in the vector $Y_{t}$ . Denote with $\hat{Y_{t}}$ the vector containing the optimal linear output estimate111The components of $\hat{Y_{t}}$ are thus given by $C\hat{x}_{k|t}$ , where the smoothed state $\hat{x}_{k|t}$ can be obtained for any $t$ with $O(t)$ operations by a fixed-interval Kalman smoothing filter [32, 19]. and use $H_{t}$ to denote the influence matrix satisfying

[TABLE]

Then, the parameter estimates achieved by GCV minimize

[TABLE]

where $S_{t}$ is the sum of squared residuals, i.e.

[TABLE]

and $\delta_{t}$ are the degrees of freedom given by the trace of $H_{t}$ , i.e.

[TABLE]

In the objective (2), the term $S_{t}$ accounts for the goodness of fit while $\delta_{t}$ assumes values on $[0,t]$ and measures model complexity. In fact, in nonparametric regularized estimation, the degrees of freedom $\delta_{t}$ can be seen as the counterpart of the number of parameters entering a parametric model [20, 13, 26].

GCV is supported by important asymptotic results. Also, for finite data set size it turns often out a good approximation of the output mean squared error [7]. It is worth stressing that such properties have been derived without postulating the correctness of the prior models describing the output data [38, 39]. In control, this means that GCV can compensate for possible modeling mismatch affecting the state space description.

Despite these nice features, the use of GCV within the control community appears limited, in particular in on-line contexts. One important reason is the following one. For state space models, there exist efficient algorithms which, for a given parameter vector, return its GCV score with $O(t)$ operations [18, 4], see also [15, 35, 21] for procedures dedicated to smoothing splines. But all of these techniques are not suited to on-line computations since they involve forward and backward recursions. Hence, if $\mathrm{GCV}_{t-1}$ is available and new data arrive at time $t$ , other $O(t)$ operations are needed to achieve $\mathrm{GCV}_{t}$ . In this paper, we will instead show that the update cost is $O(1)$ , thus paving the way to a more pervasive on-line use of GCV. This result is obtained by deriving the novel GCV filter which consists of an extension of the classical Kalman equations. Thanks to it, one can run a bank of filters (possibly in parallel) to efficiently propagate GCV over a grid of parameter values. This makes the proposed GCV filter particularly suitable for applications where a measurement model admits a state space description with dynamics depending on few parameters, see e.g. the next section for an application in numerical differentiation. In this framework, an implementation of the GCV filter via a bank of parallel filters turns out computationally attractive.

The paper is organized as follows. In Section 2, first some additional notation is introduced. Then, the GCV filter is presented. Its asymptotic properties are then discussed in Section 3. In Section 4 we illustrate some applications, including also smoothing splines and on-line regularized linear system identification with the stable spline kernel used as stochastic model for the impulse response [28, 29]. Conclusions end the paper while the correctness of the GCV filter is shown in Appendix.

2 The GCV filter

2.1 State space model

First, we provide full details about our measurements model. We use $x\sim(a,b)$ to denote a random vector $x$ with mean $a$ and covariance matrix $b$ . Then, our state space model is defined by

[TABLE]

where the initial condition $x_{1}$ and all the nosies $\{\omega_{k},e_{k}\}_{k=1,2,\ldots}$ are mutually uncorrelated. We do not specify any particular distribution for these variables, since the GCV score does not depend on the particular noise distribution222Of course, GCV may result not effective if the noises are highly non-Gaussian. Different approaches, like particle filters, should instead be used if linear estimators perform poorly due e.g. to multimodal noise distributions.. If $x_{1},\,\omega_{k},\,e_{k}$ are Gaussian, then the Kalman filter provides the optimal state estimate in the mean-square sense. In the other cases, the Kalman filter corresponds to the best linear state estimator [1]. In addition, just to simplify notation the measurements $y_{k}$ are assumed scalar, so that $\gamma$ represents the noise variance.

We assume that some of the parameters in (3) may be unknown, or could enter $A_{k},B_{k},Q_{k}$ and $P_{0}$ ; however, we do not stress this possible dependence to make the formulas more readable. The matrix $P_{0}$ is assumed to be independent of $\gamma$ . Such parameter is typically unknown, being connected to the ratio between the measurement noise variance and the variance of the driving noise. It corresponds to the regularization parameter in the smoothing-splines context described in the example below.

Example 1 (Smoothing splines [30]).

Function estimation and numerical differentiation are often required in various applications. These include also input reconstruction in nonlinear dynamic systems as described e.g. in [30]. Assume that one is interested in determining the first $m$ derivatives of a continuous-time signal measured with non-uniform sampling periods $T_{k}$ . Modeling the signal as an ${\it m}$ -th fold integrated Wiener process one obtains the stochastic interpretation of the $m$ -th order smoothing splines [40]. In particular, one can use (3) to represent the signal dynamics as follows

[TABLE]

Such model depends on the measurement noise variance $\gamma$ , making this application particularly suited for the GCV filter.

2.2 The GCV filter

The GCV filter equations are now reported. Below, $\hat{x}_{k}$ denotes the optimal linear one-step ahead state prediction having covariance $P_{k}$ . Its dynamics are regulated by the classical Kalman filter via (5a), (5c) and the Riccati equation (5e).

GCV filter

Initialization

[TABLE]

Update

[TABLE]

It is apparent that the difference w.r.t the classical Kalman filter is the presence of the additional state $\hat{\zeta}_{k}$ of the same dimension of $\hat{x}_{k}$ . Comparing (5c) and (5d), one can see that $A_{k}$ is replaced by $A_{k}-K_{k}C_{k}$ . In addition, the dynamics of $\hat{\zeta}_{k}$ are still driven by the innovation $y_{k}-C_{k}\hat{x}_{k}$ , but the Kalman gain $K_{k}$ given by (5a) is substituted by the $G_{k}$ defined by (5b). In turn, such gain depends on $\Sigma_{k}$ which is propagated over time through a modified version of the Riccati equation given by (5f). The GCV filter is graphically depicted in Fig. 1.

3 Asymptotic behavior and the smoothing ratio

3.1 Asymptotic behavior of the GCV filter

We first consider the case where the state-space model (3) is time-invariant, i.e. the matrices $A_{k}$ , $C_{k}$ , and $Q_{k}$ are constant in $k$ . The structure of the equations governing the GCV filter permits to easily understand its asymptotic behaviour. In particular, exploiting well known properties of the Kalman filter [1], the following result is obtained (see the Appendix for a proof).

Proposition 1.

Assume that the system (3) is time-invariant, stabilizable and detectable. Then, for any $P_{0}$ we have

[TABLE]

where $\bar{P}$ and $\bar{\Sigma}$ are the unique symmetric and semidefinite positive matrices solving, respectively, the algebraic Riccati equation

[TABLE]

and the Lyapunov equation

[TABLE]

where $\bar{K}=A\bar{P}C^{T}(C\bar{P}C^{T}+\gamma)^{-1}$ . In addition, all the roots of the matrix $A-\bar{K}C$ are inside the unit circle so that the (asymptotic) GCV filter is asymptotically stable.

Properties of the GCV filter can be also characterized in the time-varying case. In particular, following Section 2 of [2], one can first replace stabilizability and detectability with the assumptions of uniform stabilizability and detectability. Then, following the same reasonings contained in the proof of Proposition 1, Theorem 5.3 in [2] ensures the uniform exponential stability of the GCV filter.

3.2 Fast regularization parameter tuning and the smoothing ratio

Proposition 1 leads also to a new computationally appealing approach to tune the regularization parameter $\gamma$ e.g. in smoothing splines. In particular, consider the scenario described in [21] where an unknown function has to be reconstructed by spline regression from equally spaced noisy samples. When assumptions in Proposition 1 hold true, it is possible to compute off-line the gains

[TABLE]

Then, one can exploit the asymptotic (suboptimal) GCV filter, with the guarantee that the objective values will converge to the exact GCV scores as $k$ increases. Moreover, in off-line contexts this approach appears computationally appealing even when compared to the many GCV-based spline algorithms developed in the last decades [41, 40, 18, 15, 35].

Furthermore, [21] defined the asymptotic smoothing ratio as

[TABLE]

also providing an interesting closed-form expression for the cubic splines case useful to further speed up the tuning of $\gamma$ . For the general case, we notice that Proposition 1 gives also a numerical procedure to compute the asymptotic smoothing ratio (for different values of $\gamma$ ). In fact, if (3) is stabilizable and detectable, combining (5g) and Proposition 1 we obtain

[TABLE]

with $\bar{\Sigma}$ and $\bar{P}$ defined, respectively, in (6) and (7).

4 Numerical Examples

4.1 Spline example

We consider the reconstruction of the function $\exp(\sin 8t)$ taken from [29] from samples collected at 400 instants $t_{i}$ randomly generated from a uniform distribution on $[0,1]$ . The measurement noise is Gaussian with standard deviation equal to 0.3. We model $f$ as the two-fold integral of white noise setting $m=2$ in the time-varying state space model reported in Example 1. This corresponds to reconstructing $f$ using cubic smoothing splines [40].

We use $Z_{t}$ to denote the vector containing the noiseless outputs (corresponding to the second entries of $\{x_{k}\}_{k=1}^{t}$ ). We denote the average of $Z_{t}$ by the scalar quantity $\bar{Z}_{t}$ . Then, the performance measure is the percentage fit

[TABLE]

where $\hat{Z}_{t}$ is the estimate of $Z_{t}$ obtained through the Kalman smoother [1], and $1\!\!\!1$ a column vector with all entries equal to 1. The following two different estimators $\hat{Z}_{t}$ are tested:

•

GCV: this approach estimates $\gamma$ exploiting the GCV filter. More specifically, the GCV score is propagated over a grid containing 100 values of $\gamma$ logarithmically spaced on the interval $[10^{-2},10^{4}]$ . Then, at any $t$ the estimate $\hat{Z}_{t}$ is computed by a Kalman smoothing filter which exploits the $\gamma_{t}$ that minimizes $GCV_{t}$ .

•

Oracle: the same as GCV except that $\gamma_{t}$ maximizes the fit $\mathcal{F}_{t}$ . Note that this approach is not implementable in practice since it uses an oracle that knows the noiseless (unavailable) output $Z_{t}$ .

The left panel of Fig. 2 displays the noiseless output (solid line), the measurements ( $\circ$ ) and the function estimate returned by GCV (dashed line) which turns out close to $f$ . The right panel also shows that the GCV filter is able to track well and in an on-line manner the time-course of $\gamma$ returned by Oracle.

4.2 GCV capability to compensate for model mismatch

We consider the following discrete-time model (see also [23, Section 6]):

[TABLE]

with zero-mean Gaussian noises of covariances

[TABLE]

We will use data generated by this model to test the capability of the GCV filter to compensate for mismatches between the true system and the model used to track the data by tuning $\gamma$ in an on-line manner. As in the previous example $Z_{t}$ is the vector containing the first $t$ noiseless outputs (which are the second entries of $\{x_{k}\}_{k=1}^{t}$ ) and the performance measure is (8). The following three different estimators $\hat{Z}_{t}$ are tested:

•

GCV: this approach uses a wrong transition covariance given by

[TABLE]

and then estimates $\gamma$ exploiting the GCV filter over a grid with 100 values logarithmically spaced on $[10^{-2},10^{4}]$ . Then, at any $t$ the estimate $\hat{Z}_{t}$ is computed by a Kalman smoothing filter which exploits the $\gamma_{t}$ minimizing $GCV_{t}$ .

•

Oracle: the same as GCV except that $\gamma_{t}$ maximizes the fit $\mathcal{F}_{t}$ in (8).

•

Nominal: the estimate $\hat{Z}_{t}$ is returned by a Kalman smoothing filter defined by the nominal wrong covariance $\tilde{Q}$ and $\gamma=30$ . Thus, this approach does not try to compensate for model mismatch since it does not tune $\gamma$ from data.

The left panel of Fig. 3 displays the noiseless output (solid line), the measurements ( $\circ$ ) and the smoothed output obtained by GCV (dashed line) which appears close to truth. In the right panel, one can also see the trajectory in time of the $\gamma_{t}$ returned by Oracle and by GCV. One can appreciate the capability of the GCV filter to compensate the modelling mismatch by tracking a regularization parameter leading to a high fit $\mathcal{F}_{t}$ .

To further support these findings, we have also performed a Monte Carlo study of 100 runs. During each run, 200 output measurements are generated using (9) and $Z_{t}$ is reconstructed by GCV, Oracle and Nominal. From the MATLAB boxplots of the 100 fits (8) reported in Fig. 4, the robustness of GCV emerges clearly. Its performance is in fact very close to that of the oracle-based procedure.

4.3 On-line regularized linear system identification

Now, we consider a linear system identification problem where the aim is to estimate an unknown impulse response from input-output measurements. Assuming a high order FIR, the model describing the outputs collected up to instant $t$ , and stacked in the (column) vector $Y_{t}$ , is

[TABLE]

where $g$ denotes the $m$ -dimensional vector whose components are the impulse response coefficients, the regression matrix $\Phi_{t}$ is defined by the input samples and $E_{t}$ is the measurement noise vector, which we assume white and Gaussian.

To solve this problem, we use the kernel-based approach originally proposed in [28, 27, 6]. The impulse response estimate is given by

[TABLE]

It makes use of the regularization matrix $P_{0}$ induced by the so called first-order spline kernel, i.e. its $(i,j)$ entry is

[TABLE]

where $\alpha$ is an hyperparameter which regulates the rate of decay to zero of the components of $g$ . We refer the reader also to [29] for further details on advantages of (11) over classical parametric approaches.

In real applications, both $\gamma$ and $\alpha$ are unknown. Since we consider a situation where $g$ has to be estimated on-line, we will estimate these two hyperparameters by the GCV filter. To do that, we first notice that (11) corresponds to the maximum a posteriori (MAP) estimator of $g$ and, under the stated Gaussian assumptions, also to its minimum mean-square estimator (MMSE). The estimate of $g$ can then be computed using the Kalman filter. In fact the state space model is

[TABLE]

where the state vector is the stochastic model for $g$ (with $x_{k}=g$ for any $k$ ), $y_{k}$ and $e_{k}$ are the ${\it k}$ -th entries of $Y_{t}$ and $E_{t}$ , respectively, and $C_{k}$ is the ${\it k}$ -th row of $\Phi_{t}$ .

We define a grid in the plane $(\gamma,\,\alpha)$ taking the points such that $\alpha$ is in the set $\{0.5,\,0.6,\,\ldots,\,0.9,\,0.95,\,0.99\}$ , while $\gamma$ assumes values in a logarithmically spaced grid of 20 point between $10^{-2}$ and $10^{3}$ . In this way, the grid consists of 140 points. We run 140 GCV filters in parallel, each corresponding to one of the points; when a new measure $y_{k}$ (and $C_{k}$ ) is available, we update the GCV score of each pair $(\gamma,\,\alpha)$ , selecting the one giving the minimum score.

We test the obtained GCV filter for on-line regularized system identification on a set of 100 Monte Carlo runs. At any run, a random impulse response of length $m=200$ is generated using the same mechanism described in [24, Section 7.4]. The generated system is fed with a with noise sequence of unit variance. Note that this type of input is persistently exciting and guarantees the observability of the system (4.3) (see e.g. [2]), avoiding the covariance windup phenomenon [36]. The standard deviation of the measurement noise is that of the 200 noiseless outputs divided by 10. We assume the system is at rest (the input is equal to zero) prior to the data collection. The performance of the estimator is evaluated by means of the fit (as a function of time)

[TABLE]

where $g^{i}$ is the impulse response generated at the ${\it i}$ -th Monte Carlo run, $\bar{g}^{i}$ its mean, and $\hat{g}_{t}^{i}$ its estimate (the impulse response estimate is function of the time instant $t$ ).

An example of one of the Monte Carlo runs is given in Fig. 5, which shows the evolution in time of the impulse response estimate and its fit. It suffices 50 measurements to the GCV filter to achieve an appreciable fit. The overall results of the Monte Carlo experiment are summarized in Fig. 6, which depicts the average fit of the impulse responses as a function of time. It can be seen that, after a short transient phase, the fit increases monotonically and achieves a high average value.

5 Conclusions

The novel filter here presented allows to propagate efficiently the GCV score over time. Hence, unknown parameters entering a state space model can now be estimated in an on-line manner resorting to one of the most important techniques used for parameter estimation. The asymptotic properties of the GCV filter provide also a new very efficient way to estimate the regularization parameter e.g. in smoothing splines. Applications of the new filter have been illustrated using artificial data regarding a function estimation and on-line regularized linear system identification.

A Matlab implementation of the GCV filter is available at the web page http://www.dei.unipd.it/ giapi/.

6 Appendix

6.1 Derivation of the GCV filter

Without loss of generality, we set the initial system condition to zero, i.e. $\mu=0$ . We also use $X_{t},Y_{t}$ and $E_{t}$ to denote the column vectors containing the states, the outputs and the measurements noises up to instant $t$ , i.e.

[TABLE]

Then, it holds that

[TABLE]

where $O_{t}=\mbox{diag}\{C_{1},\,\ldots,\,C_{t}\}$ is the regression matrix built using the measurement matrices $C_{k}$ , $k=1,\ldots,t$ . We also use $W_{t}$ and $V_{t}$ to denote the state and output covariance matrix, i.e.

[TABLE]

where $I_{t}$ is the $t\times t$ identity matrix. Note that, using the above notation, the smoothed estimate of $Y_{t}$ , already encountered in Section 1, is

[TABLE]

so that the degrees of freedom at instant $t$ turn out

[TABLE]

The following simple lemma is useful for our future developments.

Lemma 2.

One has

[TABLE]

Proof: In view of (15), we start noticing that

[TABLE]

Then, (18) is obtained from the following equalities

[TABLE]

where the last two passages exploit (20) and (17), respectively.

Eq. 19 is instead proved as follows

[TABLE]

where the second and third equality exploit (20) and (16), respectively.

$\blacksquare$

The dynamics of the matrix $P_{k}$ in the GCV filter are regulated by the discrete-time algebraic Riccati equation (DARE), which can be also rewritten as

[TABLE]

It is now easy to see that the matrix $\Sigma_{k}$ entering the GCV filter is the partial derivative of $P_{k}$ w.r.t. $\gamma$ . In fact, differentiating the DRE, and adopting the notation $\Sigma_{k}:=\frac{\partial P_{k}}{\partial\gamma}$ , one has

[TABLE]

Exploiting the definition of $K_{k}$ and rearraging the terms, the recursive formula (5f) is obtained.

Now, consider the dynamics of the predicted state

[TABLE]

We now show that $\hat{\zeta}_{k}$ is the partial derivative of $\hat{x}_{k}$ w.r.t. $\gamma$ . In fact, differentating the above equation using the correspondence $\hat{\zeta}_{k}:=\frac{\partial\hat{x}_{k}}{\partial\gamma}$ , one obtains

[TABLE]

This, combined with the definition of $K_{k}$ , leads to the recursive formula (5d).

Now, exploiting well known properties of the innovations sequence $\{y_{k}-C_{k}\hat{x}_{k}\}_{k=1}^{t}$ , whose variances are $\{C_{k}P_{k}C_{k}^{T}+\gamma\}_{k=1}^{t}$ , and recalling that $\Sigma_{k}:=\frac{\partial P_{k}}{\partial\gamma}$ , we have

[TABLE]

Then, the recursive formula (5g) for the degrees of freedom $\delta_{k}$ is obtained combining the above equation and (18).

Still using properties of the innovations sequence, and recalling that $\hat{\zeta}_{k}:=\frac{\partial\hat{x}_{k}}{\partial\gamma}$ , one also has

[TABLE]

This equation, in combination with (19), proves the correctness of the update rule (5h) for the sum of squared residuals $S_{k}$ and completes the derivation.

6.2 Proof of Proposition 1

If the system (3) is stabilizable and detectable, then standard properties of the algebraic Riccati equation (6) ensure that $\bar{P}$ is symmetric and positive semidefinite and that the Kalman filter, corresponding to (5a), (5c) and (5e), is asymptotically stable (see [1, p. 77]). Because the Kalman filter is asymptotically stable, the matrix $(A-\bar{K}C)$ has all the eigenvalues inside the unit circle, ensuring that (7) admits a unique positive semidefinite solution [1, p. 67]. The filter state transition matrix which regulates the dynamics of $\hat{x}_{k}$ and $\hat{\zeta}_{k}$ is

[TABLE]

and so it also has all eigenvalues inside the unit circle at least for sufficiently large $k$ .

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. D. O. Anderson and J. B. Moore. Optimal Filtering . Prentice-Hall, Englewood Cliffs, N.J., USA, 1979.
2[2] B.D.O. Anderson and J.B. Moore. Detectability and stabilizability of time-varying discrete-time linear systems. SIAM Journal on Control and Optimization , 19(1):20–32, 1981.
3[3] C. Andrieu, A. Doucet, and R. Holenstein. Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 72(3):269–342, 2010.
4[4] C.F. Ansley and R. Kohn. Efficient generalized cross-validation for state space models. Biometrika , 74(1):139–148, 1987.
5[5] M. Bertero. Linear inverse and ill-posed problems. Advances in Electronics and Electron Physics , 75:1–120, 1989.
6[6] T. Chen, H. Ohlsson, and L. Ljung. On the estimation of transfer functions, regularizations and Gaussian processes - revisited. Automatica , 48(8):1525–1535, 2012.
7[7] P. Craven and G. Wahba. Smoothing noisy data with spline functions. Numerische Mathematik , 31:377–403, 1979.
8[8] T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics , 13:1–50, 2000.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

The Generalized Cross Validation Filter

Abstract

keywords:

1 Introduction

2 The GCV filter

2.1 State space model

Example 1** (Smoothing splines [30]).**

2.2 The GCV filter

3 Asymptotic behavior and the smoothing ratio

3.1 Asymptotic behavior of the GCV filter

Proposition 1**.**

3.2 Fast regularization parameter tuning and the smoothing ratio

4 Numerical Examples

4.1 Spline example

4.2 GCV capability to compensate for model mismatch

4.3 On-line regularized linear system identification

5 Conclusions

6 Appendix

6.1 Derivation of the GCV filter

Lemma 2**.**

6.2 Proof of Proposition 1

Example 1 (Smoothing splines [30]).

Proposition 1.

Lemma 2.