On the Coordinate Change to the First-Order Spline Kernel for Regularized Impulse Response Estimation
Yusuke Fujimoto, Tianchi Chen

TL;DR
This paper explores new kernels derived from coordinate changes of the first-order spline kernel for regularized impulse response estimation, revealing properties like maximum entropy and sparse inverse Gram matrices, with spectral analysis and numerical validation.
Contribution
It introduces novel kernels based on alternative coordinate changes, extending the properties of the first-order spline kernel for improved impulse response estimation.
Findings
New kernels inherit maximum entropy property
Inverse Gram matrices are sparse
Spectral analysis confirms kernel properties
Abstract
The so-called tuned-correlated kernel (sometimes also called the first-order stable spline kernel) is one of the most widely used kernels for the regularized impulse response estimation. This kernel can be derived by applying an exponential decay function as a coordinate change to the first-order spline kernel. This paper focuses on this coordinate change and derives new kernels by investigating other coordinate changes induced by stable and strictly proper transfer functions. It is shown that the corresponding kernels inherit properties from these coordinate changes and the first-order spline kernel. In particular, they have the maximum entropy property and moreover, the inverse of their Gram matrices has sparse structure. In addition, the spectral analysis of some special kernels are provided. Finally, a numerical example is given to show the efficacy of the proposed kernel.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsControl Systems and Identification · Structural Health Monitoring Techniques · Probabilistic and Robust Engineering Design
On the Coordinate Change to the First-Order Spline Kernel for Regularized Impulse Response Estimation
Yusuke Fujimoto [email protected]
Tianshi Chen [email protected] Faculty of Environmental Engineering, The University of Kitakyushu, Wakamatsu-ku, Kitakyushu, 808-0135, Japan
School of Science and Engineering and Shenzhen Research Institute of Big Data, The Chinese University of HongKong, Shenzhen, 518172, China
Abstract
The so-called tuned-correlated kernel (sometimes also called the first-order stable spline kernel) is one of the most widely used kernels for the regularized impulse response estimation. This kernel can be derived by applying an exponential decay function as a coordinate change to the first-order spline kernel. This paper focuses on this coordinate change and derives new kernels by investigating other coordinate changes induced by stable and strictly proper transfer functions. It is shown that the corresponding kernels inherit properties from these coordinate changes and the first-order spline kernel. In particular, they have the maximum entropy property and moreover, the inverse of their Gram matrices has sparse structure. In addition, the spectral analysis of some special kernels are provided. Finally, a numerical example is given to show the efficacy of the proposed kernel.
keywords:
Identification methods, kernel-based regularization methods, impulse response estimation, kernels.
\savesymbol
AND
††thanks: This paper was not presented at any IFAC meeting. Corresponding author Y. Fujimoto. Tel. +81-93-695-3545.
and
1 INTRODUCTION
One of the main difficulties in system identification is to balance the data fit and the model complexity [1]. Recently, a new method to handle this issue is proposed by Pillonetto and De Nicolao, especially for the impulse response estimation of linear time-invariant systems [2]. Their main idea comes from the regression over the Reproducing Kernel Hilbert Space (RKHS) [3, 4] in the machine learning field. These spacecs are related to bivariate functions that are called kernels and this class of methods is often referred to as the kernel-based regularization methods. In contrast with the classical Prediction Error Methods (PEMs), a property of such methods is that it is possible to design through the kernel a model structure that contains a wide class of impulse responses. More specifically, recall that the classical PEMs first determines the model structure, and then tune its parameters according to the observed data. In this case, the set of all possible impulse responses is a finite dimensional manifold. On the other hand, the kernel-based regularization method, with a carefully designed kernel, searches the impulse response within a possibly infinite dimensional RKHS and thus has the potential to model complex systems.
One of the main issues for the kernel-based regularization method is how to design a suitable kernel. While various kernels have been proposed (e.g., [5, 6, 7]), three most widely used kernels are the so-called Stable Spline kernel (SS) [2], the Tuned-Correlated kernel (sometimes also called the first-order stable spline kernel) [8], and the Diagonal-Correlated (DC) kernel [8]. These three kernels have simple structures and favorable properties, and their effectiveness are shown in various works, e.g., [8, 9, 10, 11].
Interestingly, these three kernels share some common properties [7, 12]. For example, they can be derived by applying an exponential decay function as a coordinate change to different kinds of spline kernels [13] (cf. Section 2.2 for details). Moreover, they also inherit some properties from the corresponding spline kernel [14], such as the maximum entropy (MaxEnt) property and the spectral analysis.
Based on the above observations, the following questions then arise naturally:
- •
Instead of the exponential decay coordinate change, can we design kernels with other type of coordinate change suitable for system identification?
- •
What is the corresponding a priori knowledge embedded in such kernels?
In this paper, we aim to address the above questions. In particular, we will focus on the kernels derived by applying the impulse response of a stable and strictly proper transfer function as the coordinate change function to the first-order spline kernel, where is the complex frequency for the Laplace transform. Then it is obvious to see that the exponential decay function is a special case of the proposed kernels with . Besides, in our preliminary work [15], we considered the case where the coordinate change is given by , which corresponds to . Here, we will consider more general cases and moreover, we will show that such coordinate change embeds a priori knowledge from on the regularized impulse response, or equivalently, the corresponding RKHS inherits properties from . For instance, the proposed kernels are always stable, and the estimated impulse response has the same convergence rate as the coordinate change function. The relative degree of the impulse response is determined by the coordinate change function. Morevoer, we also show that the proposed kernels have the Maximum Entropy property and give the spectral analysis for some special cases based on the corresponding ones for the first-order spline kernel.
The remaining part of this paper is organized as follows. Sec. 2 recaps the kernel-based regularization methods, and states the problem considered in this paper. Sec. 3 first shows the positive definiteness and stability of the proposed kernel. Then Sec. 4 shows properties of the proposed kernels related to zero-crossing. Sec. 5 discusses the Maximum Entropy property of the proposed kernel, and Sec. 6 gives spectral analysis for some special cases. Sec. 7 shows a numerical example to demonstrate the effectiveness of coordinate changes. Finally Sec. 8 concludes this paper.
[Notations] Sets of nonnegative real numbers and natural numbers are denoted by and , respectively. The -dimensional identity matrix is denoted by . The inverse and the transposition of a matrix are denoted by and , respectively. The determinant of a square matrix is denoted by . denotes the Frobenius norm of matrix . The element of a matrix is denoted by . When is a vector, denotes the th element of . The Lebesgue integral of over is denoted by , and the integral with the measure is denoted by . In particular, the Lebesgue integral of over is denoted by . shows the set of absolute integrable functions over , i.e., The set is denoted by The expected value and variance of random variables are denoted by and , respectively. The limit denotes the right-sided limit at zero. Throughout the paper, denotes the complex frequency for the Laplace transform, and denotes the Napier’s constant.
2 PROBLEM SETTING
2.1 Kernel-based regularization methods
We first recap the kernel-based regularized method for continuous-time systems. This paper focuses on single-input-single-output, bounded-input-bounded-output, stable, linear time invariant and causal systems described by
[TABLE]
where is the time index, and are the input, the measured output, and the measurement noise at time , respectively, is the impulse response of the system, and is the convolution of the input and the impulse response, is independently and identically Gaussian distributed with mean 0 and variance . The identification problem in this paper is to estimate from the measured output and the input over the interval , where are the sampling time instants.
To this end, we use the kernel-based regularization method where the estimated impulse response is given by
[TABLE]
Here, is a Hilbert space of functions , and is the norm endowed to , and is a regularization parameter. Clearly, a good estimate of the impulse response depends on a good choice of . In the sequel, we assume that is a Reproducing Kernel Hilbert Space (RKHS).
The definitions of RKHS and the reproducing kernel are as follows. Let be a nonempty set, and consider the Hilbert space of functions denoted by . Then is a RKHS if
[TABLE]
Further let be the inner product endowed to . Then a symmetric bivariate function is the reproducing kernel of if it satisfies
[TABLE]
where indicates the single-variable function defined by setting the first argument of to . Reproducing kernels are also called kernels for short. It is well-known that the kernel exists if the Hilbert space is RKHS.
With the above definitions, the optimal solution of (2) has explicit expression. Let be the kernel of in (2). Also let be a matrix which is defined as
[TABLE]
Let and be
[TABLE]
Then, the optimal solution of (2) is given by
[TABLE]
where is a function of defined by
[TABLE]
See e.g., [9] for more details.
2.2 Problem statement
The Stable Spline (SS) kernel, the Tuned-Correlated kernel (sometimes also called the first-order stable spline kernel), and the DC kernel can all be derived by applying an exponential decay function as a coordinate change to different kinds of spline kernels. To make this point clear, we let be a kernel function and be a coordinate change function. Then the aforementioned three kernels can all be put into the following form
[TABLE]
Moreover, the coordinate change functions are all for these three kernels, while the kernel is the second order spline kernel for the SS kernel, the first order spline kernel for the TC kernel, and a generalized first order spline kernel for the DC kernel, cf. [16]. In particular, the TC kernel,
[TABLE]
can be derived by applying as the coordinate change to the first-order spline kernel
[TABLE]
where and are hyperparameters of the kernel. We consider more general coordinate changes in this paper.
Problem 1**.**
Let be a stable and strictly proper transfer function and be the impulse response of . Hereafter, we consider properties of the kernel given by the first-order spline kernel with as the coordinate change function, i.e.,
[TABLE]
or equivalently the properties of the RKHS associated with that is denoted by below.
3 POSITIVE DEFINITENESS AND STABILITY
We first recall some definitions.
A kernel is said to be positive definite if the Gram matrix of defined as
[TABLE]
is positive semidefinite for any and for any . The Moore-Aronszajin theorem states that if is positive definite, then there exists a unique RKHS whose reproducing kernel is [3].
A kernel is said to be stable if , the RKHS associated with , satisfies .
Then we have the following result111All proofs of propositions are deferred to the Appendix..
Theorem 2**.**
The kernel (12) is positive definite and moreover, stable, i.e., the corresponding RKHS is a subspace of .
Theorem 2 shows that the kernel (12) is a positive semidefinite kernel and moreover, for any , .
4 ZERO-CROSSING RELATED PROPERTIES
The following proposition shows that if has a zero-crossing, then any inherits this zero-crossing.
Proposition 3**.**
Assume that satisfies for some . Then, for any .
This proposition suggests that, if one knows that the true impulse response is zero at some time instant , then one should design such that . A typical case is , i.e., the relative degree of the system is known to be higher than or equal to two. For this case, the result can be further strengthened and is shown in the following theorem.
Theorem 4**.**
Assume that the identification input is times differentiable, and satisfies
[TABLE]
If satisfies for , then for .
Moreover, for any , how fast converges to 0 also depends on , which is stated in the following theorem.
Theorem 5**.**
Assume that the input is bounded, and let be a vector whose -th element is . When is stable and , converges to when , where is defined as (6).
In summary, inherits some properties of , i.e., how crosses or converges to zero. This is because the linear spline kernel in (12) is employed. More specifically, let be the RKHS associated with the first order spline kernel (11). Noting that for any from the reproducing property, the properties of given in this section can be derived accordingly.
5 MAXIMUM ENTROPY PROPERTY
Interestingly, the kernel (12) also inherits the maximum entropy property of the linear spline kernel (11).
Theorem 6**.**
For a given with be a sequence from , let be the permutation of such that
[TABLE]
and consider the stochastic process defined by
[TABLE]
where is a white Gaussian noise with unit variance. Then, is a Gaussian process with zero mean and as its covariance function. In addition, let be any stochastic process defined over with . Then, the Gaussian process is the solution of the MaxEnt problem
[TABLE]
where denotes the differential entropy of 222The differential entropy of random variable is defined by , where the integral is taken over the support of . .
This Maximum Entropy interpretation also suggests the special structure of the inverse of the Gram matrix of . For a given with be a sequence from , let be a permutation of , which satisfies (15). Also let be a Gram matrix of defined as
[TABLE]
Since is assumed to be zero, has a row and a column whose all elements are zero. We define as a matrix constructed by removing such a row and column from . Note that is also a Gram matrix of .
Before showing the structure of , we first give the following result.
Theorem 7**.**
The determinant of is given as
[TABLE]
Theorem 7 gives the condition where the inverse of exists; for all and for all .
Theorem 8**.**
Let , and also let be a vector which removes the element corresponds to from . Let be a row-permutation matrix such that
[TABLE]
Then the inverse matrix of is given as
[TABLE]
where is the inverse matrix of the Gram matrix of the first-order spline kernel [14],
[TABLE]
Theorem 8 gives the explicit form of the inverse matrix of . Note that is a tri-diagonal matrix. This theorem indicates that has a sparse structure, i.e., it has at most three elements in each row (or column).
Example 5.1**.**
For illustration, we consider the case , or equivalently, , and show that the corresponding has a sparse structure. We set , and computed according to Theorem 8.
Figs. 1 and 1 show the sparsity patterns of and , respectively, by using the matlab command spy. The horizontal and vertical axes show the column and row of each matrix, respectively, and the dots show the non-zero elements. We can see that is tri-diagonal, and has at most three non-zero elements in each row or column. The sparsity pattern may not be seen in a numerically computed , e.g., the one computed by using matlab command inv. For instance, spy(inv()) shows that all elements in inv() are non-zero. To illustrate the effectiveness of Theorem 8, we compute , where shows a numerically computed inverse of with Theorem 8 or inv. Then we have with Theorem 8 and with inv, respectively.
6 SPECTRAL ANALYSIS OF MULTIPLE POLE SPLINE KERNEL
It is well-known from Mercer’s Theorem that under suitable assumptions on the kernel any function in the RKHS can be represented by an orthonormal series. We show such an orthonormal basis for , which can yield a reasonable finite dimensional approximation of and can make some computations easy and fast. In this section, we focus on (12) where with , and show the spectral analysis of (12). This kernel is proposed in [15] and called the Multiple pole Spline kernel.
6.1 Preliminary
We first introduce some definitions for a positive semidefinite kernel with a compact set .
Let be a nondegenerate Borel measure on . Also let denote the space of functions of such that . For a given kernel and , we define an integral operator on :
[TABLE]
If for some ,
[TABLE]
has the solution other than , and the solution are called the eigenvalues and eigenfunctions of , respectively. Two distinct eigenfunctions and are orthogonal, i.e., . Then, the kernel has a series expansion
[TABLE]
which converges uniformly and absolutely on .
Consider the first-order spline kernel with being the Lebesgue measure. In this case, the eigenvalues and eigenfunctions are given by
[TABLE]
With these and , the spline kernel has the series expansion (26).
6.2 Main result
We consider the case , , i.e.,
[TABLE]
For the simplicity of notations and discussions, we take in the following. The extension to other is straightforward. In the rest of this section, and denote the values and functions defined in (27).
Before showing the main result, we first show a lemma.
Lemma 9**.**
Let , and . Then, and defined by (27) satisfy
[TABLE]
In addition,
[TABLE]
Lemma 9 gives the eigenvalues and eigenfunctions of over for . In particular, are orthonormal eigenfunctions.
The main result of this section is stated as follows.
Theorem 10**.**
Let be a function defined by
[TABLE]
and consider the measure induced by ; with the Lebesgue measure . Also let and be
[TABLE]
Then, we have
[TABLE]
with
[TABLE]
Theorem 10 suggests that and are the eigenvalues and eigenfunctions of with the measure induced by , respectively.
Based on Theorem 10, we have the following theorem.
Theorem 11**.**
Let .
the series expansion
[TABLE]
converges uniformly and absolutely on . 2. 2.
* forms an orthonormal basis of , and has an equivalent representation;*
[TABLE]
Moreover, the norm of is given by
[TABLE]
Example 6.1**.**
For illustration, we show the case with and .
Figs. 2 and 3 shows defined by (32) and , respectively. The horizontal axes show , and the vertical axes show and , respectively. In this case, and at .
Fig. 4 shows for . The horizontal axes show , and the vertical axes show . The top, middle, and bottom figures show , , and . These eigenfunctions satisfy and as we expected.
With the same and , we also compute where elements of and are given as and , respectively, with . Fig. 5 illustrates how converges to zero with increasing . The horizontal and vertical axes show and , respectively.
7 ILLUSTRATIVE EXAMPLE
In Sec. 7, we give a numerical example to illustrate the effectiveness of the proposed kernel. The target system is given by
[TABLE]
hence the relative degree of the target is two. For , we employ
[TABLE]
with as the hyperparameters of the kernel. The impulse response of is
[TABLE]
thus for any . As shown in Sec. 4, this makes the estimated impulse response . This means that we enjoy a priori knowledge on the system that its relative degree is higher or equal to two.
We consider the case where the input is the impulsive input, and the noise variance . The sampling period is set to 0.1 [s], and we collect .
Fig. 6 shows an example of such observed data . The horizontal axis shows time, and the vertical axis shows the observed output. Each dot shows the observed data . In the following, we identify the impulse response from such data for 300 times with independent noise realizations.
We employ the Empirical Bayes method to tune the hyperparameters, i.e., is tuned so as to maximize
[TABLE]
where is set to . Note that depends on the hyperparameter . This is based on the Gaussian process interpretation of the kernel based regularization methods. In this interpretation, the kernel is regarded as the covariance function of the zero-mean Gaussian process, and (42) shows the logarithm of marginal likelihood (some constants are ignored). Such a tuning is called the Empirical Bayes [9].
Using defined by (40) and the Empirical Bayes method, we perform the identification with for 300 times with independent noise realizations. Fig. 7 shows the estimated and true impulse response of the target system. The horizontal axis shows time, and the vertical axis shows the impulse response. The gray lines are 300 estimated impulse responses, and the red line shows the true impulse response. Apparently, the behavior of the original impulse response is well approximated with .
For comparison, we also show the result with the TC kernel and the Empirical Bayes. Recall that the TC kernel is defined as (10). Fig. 8 shows the 100 estimated impulse responses with the TC kernel and the Empirical Bayes. The estimated impulse responses converge to zero slowly, and show overfitting behavior.
For comparison, we also show the results with oracle hyperparameters, i.e., hyperparameters tuned with the true impulse response. Let and . Noting that we consider the case with impulsive input, we have
[TABLE]
where is a Gram matrix of the kernel with and . Then,
[TABLE]
and the mean square error on the sampled instants becomes
[TABLE]
In the following, we show the results with hyperparameters which minimize (45).
Figs. 9 and 10 show the 300 estimated impulse responses with such hyperparameters. Figs. 9 and 10 employ the proposed and TC kernel, respectively. In this case, the estimated impulse response with the TC kernel converges to zero smoothly.
Fig. 11 shows the boxplots of the square errors on the sampled instants, i.e., , with 300 independent noise realizations. The left two boxes show the results with the Empirical Bayes, and the right two boxes show the results with the hyperparameter tuned according to the mean square error on the sampled instants. The proposed kernel with the Empirical Bayes shows almost the same performance as the TC with the oracle hyperparameter, and the proposed kernel with the oracle hyperparameter outperforms the others. These results show that the proposed kernel is more appropriate for than the TC kernel.
As a statistical analysis, we perform the Wilcoxon rank sum tests for two cases. In the first case, we focus on the proposed kernel with the Empirical Bayes and the TC kernel with the oracle hyperparameter. The null hypothesis is that two medians of the square errors on the sampled instants are the same (two-sided rank sum test). The -value is 0.37, thus this null hypothesis can not be rejected. This implies that the proposed method with the Empirical Bayes performs as well as the TC kernel with the optimal hyperparameter. In the second case, we focus on the proposed and the TC kernel with the oracle hyperparameters. The null hypothesis is that the median of the square errors become smaller with the TC kernel (one-sided rank sum test). The -value is , thus the alternative hypothesis is highly significant. This suggests that the proposed kernel has potential to achieve better estimate than the TC kernel.
From the above results, it is confirmed that the prposed kernel (12) can be useful for regularized impulse resopnse estimation, provided that the coordinate change is designed by taking into account the a priori knowledge on the system to be identified.
8 CONCLUSION
This paper focuses on kernels derived by appling coordinate changes induced by stable and strictly proper transfer functions to the first-order spline kernel. They are generalizations of the tuned-correlated kernel, which is one of the most widely used kernels in the regularized impulse response estimation. It is shown that the proposed kernels inherit properties from the coordinate changes such as the relative degree and the convergence rate. Also they inherit the Maximum Entropy property from the first-order spline kernel. Spectral analysis is given for the case where the coordinate change is chosen as . Numerical lexample is given to demonstrate the effectiveness of the proposed kernel and shows that a suitable coordinate change could give better performance than the tuned-correlated kernel.
Extension to cases for the second-order spline kernel or the generalized spline kernel are future tasks. Another future task is to find the optimal coordinate change in some sense for given a priori knowledge on the system to be identified.
Appendix A Proofs
A.1 Proof of Theorem 2
is interpreted as the first-order spline kernel with and the coordinate change . This suggests that is positive definite, hence there exists an RKHS associated with .
We recall the following proposition for the proof about the stability; if the kernel is a nonnegative valued function, i.e., , then is stable if and only if
[TABLE]
See Proposition 15 in [9] for more detail about the stability of the kernel.
The proof about the stability is based on the following Lemma.
Lemma 12**.**
For any stable and strictly proper rational transfer function , there exists and which satisfies
[TABLE]
The proof of Lemma 12 is given in Appendix A.2. Based on Lemma 12,
[TABLE]
Since is a nonnegative valued kernel and satisfies (46), the statement is proven.
A.2 Proof of Lemma 12
From the assumption that is stable and a strictly proper rational function of , is divided into four parts; derived from single-real poles, single-complex poles, repeated real poles, and repeated complex poles. In summary, we have
[TABLE]
where , and denote the number of distinct real poles, the number of distinct complex poles, the largest multiplicity of the real poles, and the largest multiplicity of the complex poles, respectively. and and show the distinct real poles and complex poles, respectively. Note that and from the stability assumption. In the following, we show that each term of (49) is bounded by an exponential.
For the ease notations, we employ instead of for a while. We show that is bounded by , where denotes the factorial of , i.e., . For ,
[TABLE]
holds. The second equality is derived from the Taylor expansion of the exponential function, and the last inequality is derived from and . From this inequality, we have
[TABLE]
where . Let . Then, for and we have
[TABLE]
with
[TABLE]
By noting
[TABLE]
the same proof can be applied for the second term of (49), and
[TABLE]
with
[TABLE]
From the above discussions, we have
[TABLE]
where
[TABLE]
and this completes the proof.
A.3 Proof of Proposition 3
From the reproducing property of ,
[TABLE]
Here we use .
A.4 Proof of Theorem 4
We first prove the case where . Consider defined by (8). From the assumption that when , is rewritten as
[TABLE]
for sufficiently small . By noting , we have from
[TABLE]
This holds for all , and we conclude .
Next, we consider the case . From (60), we have
[TABLE]
Again by noting that is bounded and from the assumption, we have and .
Finally, we prove the case where . Let . When , we have
[TABLE]
From the assumption that and its derivatives are bounded, the derivatives of are also bounded for . Thus, if for all , we have and the proof has been completed.
A.5 Proof of Theorem 5
Consider defined by (8). Let and be sets defined by and . This indicates that when and when . Hence, we have
[TABLE]
Note that when . Since the integrand of the second term is bounded and the Lebesgue measure of goes to zero when (because ),
[TABLE]
and this indicates
[TABLE]
A.6 Proof of Theorem 2
The former half of the theorem is easily confirmed by the direct calculation;
[TABLE]
and by noting , is the covariance function of .
The latter half of the theorem is based on the Lemma 1 of [14], which is stated as follows.
Lemma 13** (Chen et al.).**
Let be any stochastic process with for . For any and , the discrete-time Wiener process is the solution of the MaxEnt problem
[TABLE]
where the discrete-time Wiener process is given by
[TABLE]
Let be a function which maps to for , i.e., . Also let and be and , respectively. With these notations, the original MaxEnt problem becomes
[TABLE]
From Lemma 1 of [14], the optimal solution of this MaxEnt problem is given by (70), and this completes the proof.
A.7 Proof of Theorems 7 and 8
We use the result in [14].
Proposition 14** (Chen et al.).**
Consider the discrete-time Wiener kernel
[TABLE]
Under the assumption that , the Gram matrix
[TABLE]
satisfies
[TABLE]
and
[TABLE]
By noting that is equivalent to and , we have the results.
A.8 Proof of Lemma 9
With the transformation , we have
[TABLE]
A.9 Proof of Theorems 10 and 11
Divide the interval into and . Note that is monotonic on each interval from
[TABLE]
and has the inverse function on each interval. In particular, the inverse function on is given by where denotes the principal branch of the Lambert W function (see Appendix B for a brief introduction of the Lambert W function). This is confirmed from the direct calculation;
[TABLE]
where denotes . Similarly, the inverse function of on the interval is given by where denotes the minor branch of the Lambert W function. Note that and satisfy and , respectively. This indicates .
With these inverse relations, we change the integration variable from to .
[TABLE]
Here we use Lemma 9. The orthonormality of is shown with the same integration variable change.
[TABLE]
The last equality is based on the orthonormality of over .
Theorem 11 is a direct consequence of Theorem 4 in page 37 of [17].
Appendix B The Lambert W function
This appendix gives a brief introduction of the Lambert W function. See e.g., [18] for more detail.
The Lambert W function is a set of functions which satisfies
[TABLE]
for any . If we restrict our attention to the case , the Lambert W function is divided into two branches; the principal branch and the minor branch.
Fig. 12 illustrates the Lambert W function on the real axis. The Lambert W function is double-valued on , and divided into two branches; and . The former one is called the principal branch, and the latter one is called the minor branch. We use notations and to denote the principal and the minor branch, respectively.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] L. Ljung. System Identification: Theory for the User . Prentice Hall, Upper Saddle River, NJ, 2nd edition edition, 1999.
- 2[2] G. Pillonetto and G. De Nicolao. A new kernel-based approach for linear system identification. Automatica , 46(1):81–93, 2010.
- 3[3] N. Aronszajn. Theory of Reproducing Kernels. Transactions of the American Mathematical Sociery , 68(3):337–404, 1950.
- 4[4] B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond . MIT press, 2001.
- 5[5] G. Prando and A. Chiuso. Model reduction for linear Bayesian System Identification. In Proceedings of IEEE 54th Conference on Decision and Control , pages 2121–2126, 2015.
- 6[6] T. Chen and L. Ljung. Regularized system identification using orthonormal basis functions. In Proceedings of 2015 European Control Conference , pages 1291–1296. IEEE, 2015.
- 7[7] T. Chen. On kernel design for regularized LTI system identification. Automatica , 90:109–122, 2018.
- 8[8] T. Chen, H. Ohlsson, and L. Ljung. On the estimation of transfer functions, regularizations and Gaussian processes–Revisited. Automatica , 48(8):1525–1535, 2012.
