On the Coordinate Change to the First-Order Spline Kernel for   Regularized Impulse Response Estimation

Yusuke Fujimoto; Tianchi Chen

arXiv:1901.10835·eess.SY·December 20, 2024

On the Coordinate Change to the First-Order Spline Kernel for Regularized Impulse Response Estimation

Yusuke Fujimoto, Tianchi Chen

PDF

Open Access

TL;DR

This paper explores new kernels derived from coordinate changes of the first-order spline kernel for regularized impulse response estimation, revealing properties like maximum entropy and sparse inverse Gram matrices, with spectral analysis and numerical validation.

Contribution

It introduces novel kernels based on alternative coordinate changes, extending the properties of the first-order spline kernel for improved impulse response estimation.

Findings

01

New kernels inherit maximum entropy property

02

Inverse Gram matrices are sparse

03

Spectral analysis confirms kernel properties

Abstract

The so-called tuned-correlated kernel (sometimes also called the first-order stable spline kernel) is one of the most widely used kernels for the regularized impulse response estimation. This kernel can be derived by applying an exponential decay function as a coordinate change to the first-order spline kernel. This paper focuses on this coordinate change and derives new kernels by investigating other coordinate changes induced by stable and strictly proper transfer functions. It is shown that the corresponding kernels inherit properties from these coordinate changes and the first-order spline kernel. In particular, they have the maximum entropy property and moreover, the inverse of their Gram matrices has sparse structure. In addition, the spectral analysis of some special kernels are provided. Finally, a numerical example is given to show the efficacy of the proposed kernel.

Equations190

y (t) = \int_{0}^{t} u (t - τ) g (τ) d τ + w (t),

y (t) = \int_{0}^{t} u (t - τ) g (τ) d τ + w (t),

\overset{g}{^} = g \in H argmin k = 1 \sum N (y (t_{k}) - \int_{0}^{t_{k}} u (t_{k} - τ) g (τ) d τ)^{2} + γ ∥ g ∥_{H}^{2} .

\overset{g}{^} = g \in H argmin k = 1 \sum N (y (t_{k}) - \int_{0}^{t_{k}} u (t_{k} - τ) g (τ) d τ)^{2} + γ ∥ g ∥_{H}^{2} .

\forall x \in X, \exists C_{x} : ∣ g (x) ∣ \leq C_{x} ∥ g ∥_{H}, \forall g \in H .

\forall x \in X, \exists C_{x} : ∣ g (x) ∣ \leq C_{x} ∥ g ∥_{H}, \forall g \in H .

f (x) = ⟨ f, K (x, \cdot) ⟩, \forall f \in H,

f (x) = ⟨ f, K (x, \cdot) ⟩, \forall f \in H,

{O}_{i, j} = \int_{0}^{t_{i}} u (t_{i} - τ_{1}) \int_{0}^{t_{j}} u (t_{j} - τ_{2}) K (τ_{1}, τ_{2}) d τ_{1} d τ_{2} .

{O}_{i, j} = \int_{0}^{t_{i}} u (t_{i} - τ_{1}) \int_{0}^{t_{j}} u (t_{j} - τ_{2}) K (τ_{1}, τ_{2}) d τ_{1} d τ_{2} .

c = (O + γ I_{N})^{- 1} y .

c = (O + γ I_{N})^{- 1} y .

\overset{g}{^} (t) = i = 1 \sum N {c}_{i} K_{i}^{u} (t),

\overset{g}{^} (t) = i = 1 \sum N {c}_{i} K_{i}^{u} (t),

K_{i}^{u} (t) = \int_{0}^{t_{i}} u (t_{i} - τ) K (τ, t) d τ .

K_{i}^{u} (t) = \int_{0}^{t_{i}} u (t_{i} - τ) K (τ, t) d τ .

K (t_{1}, t_{2}) = W (X (t_{1}), X (t_{2})), t_{1}, t_{2} \in R_{0 +} .

K (t_{1}, t_{2}) = W (X (t_{1}), X (t_{2})), t_{1}, t_{2} \in R_{0 +} .

K_{TC} (t_{1}, t_{2}) = β min (e^{- α t_{1}}, e^{- α t_{2}}), t_{1}, t_{2} \in R_{0 +}

K_{TC} (t_{1}, t_{2}) = β min (e^{- α t_{1}}, e^{- α t_{2}}), t_{1}, t_{2} \in R_{0 +}

K_{S} (τ_{1}, τ_{2}) = β min (τ_{1}, τ_{2}), τ_{1}, τ_{2} \in [0, 1],

K_{S} (τ_{1}, τ_{2}) = β min (τ_{1}, τ_{2}), τ_{1}, τ_{2} \in [0, 1],

K_{G_{0}} (τ_{1}, τ_{2}) = min (∣ g_{0} (τ_{1}) ∣, ∣ g_{0} (τ_{2}) ∣) .

K_{G_{0}} (τ_{1}, τ_{2}) = min (∣ g_{0} (τ_{1}) ∣, ∣ g_{0} (τ_{2}) ∣) .

K (x_{1}, x_{1}) K (x_{2}, x_{1}) ⋮ K (x_{m}, x_{1}) K (x_{1}, x_{2}) K (x_{m}, x_{2}) \dots ⋱ \dots K (x_{1}, x_{m}) K (x_{2}, x_{m}) ⋮ K (x_{m}, x_{m}) \in R^{m \times m},

K (x_{1}, x_{1}) K (x_{2}, x_{1}) ⋮ K (x_{m}, x_{1}) K (x_{1}, x_{2}) K (x_{m}, x_{2}) \dots ⋱ \dots K (x_{1}, x_{m}) K (x_{2}, x_{m}) ⋮ K (x_{m}, x_{m}) \in R^{m \times m},

\frac{d ^{i} u}{d t ^{i}} < \infty, i = 0, 1, \dots, k - 1.

\frac{d ^{i} u}{d t ^{i}} < \infty, i = 0, 1, \dots, k - 1.

0 = ∣ g_{0} (T_{0}) ∣ < ∣ g_{0} (T_{1}) ∣ < \dots < ∣ g_{0} (T_{n}) ∣,

0 = ∣ g_{0} (T_{0}) ∣ < ∣ g_{0} (T_{1}) ∣ < \dots < ∣ g_{0} (T_{n}) ∣,

h^{o} (T_{k}) = j = 1 \sum k w (j) ∣ g_{0} (T_{j}) ∣ - ∣ g_{0} (T_{j - 1}) ∣, k = 1, \dots, n, h (T_{0}) = 0,

h^{o} (T_{k}) = j = 1 \sum k w (j) ∣ g_{0} (T_{j}) ∣ - ∣ g_{0} (T_{j - 1}) ∣, k = 1, \dots, n, h (T_{0}) = 0,

h (\cdot) max

h (\cdot) max

subject to

V (h (T_{i}) - h (T_{i - 1})) = ∣ g_{0} (T_{i}) ∣ - ∣ g_{0} (T_{i - 1}) ∣.

\overset{ˉ}{K} = K_{G_{0}} (t_{0}, t_{0}) ⋮ K_{G_{0}} (t_{n}, t_{0}) \dots ⋱ \dots K_{G_{0}} (t_{0}, t_{n}) ⋮ K_{G_{0}} (t_{n}, t_{n}) .

\overset{ˉ}{K} = K_{G_{0}} (t_{0}, t_{0}) ⋮ K_{G_{0}} (t_{n}, t_{0}) \dots ⋱ \dots K_{G_{0}} (t_{0}, t_{n}) ⋮ K_{G_{0}} (t_{n}, t_{n}) .

det (K) = ∣ g_{0} (T_{1}) ∣ Π_{i = 1}^{n - 1} (∣ g_{0} (T_{i + 1}) ∣ - ∣ g_{0} (T_{i}) ∣) .

det (K) = ∣ g_{0} (T_{1}) ∣ Π_{i = 1}^{n - 1} (∣ g_{0} (T_{i + 1}) ∣ - ∣ g_{0} (T_{i}) ∣) .

R g = ∣ g_{0} (T_{1}) ∣ ⋮ ∣ g_{0} (T_{n}) ∣ ≜ g_{1} ⋮ g_{n} .

R g = ∣ g_{0} (T_{1}) ∣ ⋮ ∣ g_{0} (T_{n}) ∣ ≜ g_{1} ⋮ g_{n} .

K^{- 1} = R^{⊤} P R,

K^{- 1} = R^{⊤} P R,

{P}_{i, j} = ⎩ ⎨ ⎧ \frac{g _{2}}{g _{1} ( g _{2} - g _{1} )} \frac{g _{i + 1} - g _{i - 1}}{( g _{i + 1} - g _{i} ) ( g _{i} - g _{i - 1} )} \frac{1}{g _{n} - g _{n - 1}} 0 - \frac{1}{m a x ( g _{i} , g _{j} ) - m i n ( g _{i} , g _{j} )} i = j = 1, i = j = 2, \dots, n - 1. i = j = n, ∣ i - j ∣ > 1, otherwise,

{P}_{i, j} = ⎩ ⎨ ⎧ \frac{g _{2}}{g _{1} ( g _{2} - g _{1} )} \frac{g _{i + 1} - g _{i - 1}}{( g _{i + 1} - g _{i} ) ( g _{i} - g _{i - 1} )} \frac{1}{g _{n} - g _{n - 1}} 0 - \frac{1}{m a x ( g _{i} , g _{j} ) - m i n ( g _{i} , g _{j} )} i = j = 1, i = j = 2, \dots, n - 1. i = j = n, ∣ i - j ∣ > 1, otherwise,

L_{K} ϕ (x) = \int_{X} K (x, x^{'}) ϕ (x^{'}) d μ (x^{'}), x \in X .

L_{K} ϕ (x) = \int_{X} K (x, x^{'}) ϕ (x^{'}) d μ (x^{'}), x \in X .

L_{K} ϕ (x) = λ ϕ (x), x \in X,

L_{K} ϕ (x) = λ ϕ (x), x \in X,

K (x, x^{'}) = i = 1 \sum \infty λ_{i} ϕ_{i} (x) ϕ_{i} (x^{'}),

K (x, x^{'}) = i = 1 \sum \infty λ_{i} ϕ_{i} (x) ϕ_{i} (x^{'}),

λ_{i} = \frac{1}{( i - \frac{1}{2} ) ^{2} π ^{2}}, ϕ_{i} (x) = 2 sin ((i - \frac{1}{2}) π x),

λ_{i} = \frac{1}{( i - \frac{1}{2} ) ^{2} π ^{2}}, ϕ_{i} (x) = 2 sin ((i - \frac{1}{2}) π x),

\int_{0}^{1} min (x, x^{'}) ϕ_{i} (x^{'}) d x^{'} = λ_{i} ϕ_{i} (x) (i = 1, 2, \dots) .

K_{G_{0}} (τ_{1}, τ_{2}) = min (τ_{1}^{n} e^{- α τ_{1}}, τ_{2}^{n} e^{- α τ_{2}}) .

K_{G_{0}} (τ_{1}, τ_{2}) = min (τ_{1}^{n} e^{- α τ_{1}}, τ_{2}^{n} e^{- α τ_{2}}) .

\int_{0}^{T} min (x, x^{'}) ϕ_{i} (x^{'} / T) d x^{'} = T^{2} λ_{i} ϕ_{i} (x / T) .

\int_{0}^{T} min (x, x^{'}) ϕ_{i} (x^{'} / T) d x^{'} = T^{2} λ_{i} ϕ_{i} (x / T) .

\int_{0}^{T} ϕ_{i} (x^{'} / T) ϕ_{j} (x^{'} / T) d x^{'} = {0 T i \neq = j, i = j .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsControl Systems and Identification · Structural Health Monitoring Techniques · Probabilistic and Robust Engineering Design

Full text

On the Coordinate Change to the First-Order Spline Kernel for Regularized Impulse Response Estimation

Yusuke Fujimoto [email protected]

Tianshi Chen [email protected] Faculty of Environmental Engineering, The University of Kitakyushu, Wakamatsu-ku, Kitakyushu, 808-0135, Japan

School of Science and Engineering and Shenzhen Research Institute of Big Data, The Chinese University of HongKong, Shenzhen, 518172, China

Abstract

The so-called tuned-correlated kernel (sometimes also called the first-order stable spline kernel) is one of the most widely used kernels for the regularized impulse response estimation. This kernel can be derived by applying an exponential decay function as a coordinate change to the first-order spline kernel. This paper focuses on this coordinate change and derives new kernels by investigating other coordinate changes induced by stable and strictly proper transfer functions. It is shown that the corresponding kernels inherit properties from these coordinate changes and the first-order spline kernel. In particular, they have the maximum entropy property and moreover, the inverse of their Gram matrices has sparse structure. In addition, the spectral analysis of some special kernels are provided. Finally, a numerical example is given to show the efficacy of the proposed kernel.

keywords:

Identification methods, kernel-based regularization methods, impulse response estimation, kernels.

\savesymbol

AND

††thanks: This paper was not presented at any IFAC meeting. Corresponding author Y. Fujimoto. Tel. +81-93-695-3545.

and

1 INTRODUCTION

One of the main difficulties in system identification is to balance the data fit and the model complexity [1]. Recently, a new method to handle this issue is proposed by Pillonetto and De Nicolao, especially for the impulse response estimation of linear time-invariant systems [2]. Their main idea comes from the regression over the Reproducing Kernel Hilbert Space (RKHS) [3, 4] in the machine learning field. These spacecs are related to bivariate functions that are called kernels and this class of methods is often referred to as the kernel-based regularization methods. In contrast with the classical Prediction Error Methods (PEMs), a property of such methods is that it is possible to design through the kernel a model structure that contains a wide class of impulse responses. More specifically, recall that the classical PEMs first determines the model structure, and then tune its parameters according to the observed data. In this case, the set of all possible impulse responses is a finite dimensional manifold. On the other hand, the kernel-based regularization method, with a carefully designed kernel, searches the impulse response within a possibly infinite dimensional RKHS and thus has the potential to model complex systems.

One of the main issues for the kernel-based regularization method is how to design a suitable kernel. While various kernels have been proposed (e.g., [5, 6, 7]), three most widely used kernels are the so-called Stable Spline kernel (SS) [2], the Tuned-Correlated kernel (sometimes also called the first-order stable spline kernel) [8], and the Diagonal-Correlated (DC) kernel [8]. These three kernels have simple structures and favorable properties, and their effectiveness are shown in various works, e.g., [8, 9, 10, 11].

Interestingly, these three kernels share some common properties [7, 12]. For example, they can be derived by applying an exponential decay function as a coordinate change to different kinds of spline kernels [13] (cf. Section 2.2 for details). Moreover, they also inherit some properties from the corresponding spline kernel [14], such as the maximum entropy (MaxEnt) property and the spectral analysis.

Based on the above observations, the following questions then arise naturally:

•

Instead of the exponential decay coordinate change, can we design kernels with other type of coordinate change suitable for system identification?

•

What is the corresponding a priori knowledge embedded in such kernels?

In this paper, we aim to address the above questions. In particular, we will focus on the kernels derived by applying the impulse response of a stable and strictly proper transfer function $G(s)$ as the coordinate change function to the first-order spline kernel, where $s$ is the complex frequency for the Laplace transform. Then it is obvious to see that the exponential decay function $e^{-\alpha t}$ is a special case of the proposed kernels with $G(s)=\frac{1}{s+\alpha}$ . Besides, in our preliminary work [15], we considered the case where the coordinate change is given by $t^{n}e^{-\alpha t}$ , which corresponds to $G(s)=\frac{1}{(s+\alpha)^{n}}$ . Here, we will consider more general cases and moreover, we will show that such coordinate change embeds a priori knowledge from $G(s)$ on the regularized impulse response, or equivalently, the corresponding RKHS inherits properties from $G(s)$ . For instance, the proposed kernels are always stable, and the estimated impulse response has the same convergence rate as the coordinate change function. The relative degree of the impulse response is determined by the coordinate change function. Morevoer, we also show that the proposed kernels have the Maximum Entropy property and give the spectral analysis for some special cases based on the corresponding ones for the first-order spline kernel.

The remaining part of this paper is organized as follows. Sec. 2 recaps the kernel-based regularization methods, and states the problem considered in this paper. Sec. 3 first shows the positive definiteness and stability of the proposed kernel. Then Sec. 4 shows properties of the proposed kernels related to zero-crossing. Sec. 5 discusses the Maximum Entropy property of the proposed kernel, and Sec. 6 gives spectral analysis for some special cases. Sec. 7 shows a numerical example to demonstrate the effectiveness of coordinate changes. Finally Sec. 8 concludes this paper.

[Notations] Sets of nonnegative real numbers and natural numbers are denoted by $\mathbb{R}_{0+}$ and $\mathbb{N}$ , respectively. The $n$ -dimensional identity matrix is denoted by $I_{n}$ . The inverse and the transposition of a matrix $A$ are denoted by $A^{-1}$ and $A^{\top}$ , respectively. The determinant of a square matrix $A$ is denoted by $\det(A)$ . $\|A\|_{\rm FRO}$ denotes the Frobenius norm of matrix $A$ . The $(i,j)$ element of a matrix $A$ is denoted by $\{A\}_{i,j}$ . When $a$ is a vector, $\{a\}_{i}$ denotes the $i$ th element of $a$ . The Lebesgue integral of $f(x)$ over $\mathcal{X}$ is denoted by $\int_{\mathcal{X}}f(x)dx$ , and the integral with the measure $\mu$ is denoted by $\int_{\mathcal{X}}f(x)d\mu(x)$ . In particular, the Lebesgue integral of $f(x)$ over $[a,b)$ is denoted by $\int_{a}^{b}f(x)dx$ . $\mathcal{L}_{1}[0,\infty]$ shows the set of absolute integrable functions over $[0,\infty)$ , i.e., $\mathcal{L}_{1}[0,\infty]=\{f\mid\int_{0}^{\infty}|f(x)|dx<\infty\}.$ The set $\{a_{1},\ldots,a_{n}\}$ is denoted by $\{a_{k}\}_{k=1}^{n}.$ The expected value and variance of random variables are denoted by $\mathbb{E}$ and $\mathbb{V}$ , respectively. The limit $\lim_{t\to+0}f(t)$ denotes the right-sided limit at zero. Throughout the paper, $s$ denotes the complex frequency for the Laplace transform, and $e$ denotes the Napier’s constant.

2 PROBLEM SETTING

2.1 Kernel-based regularization methods

We first recap the kernel-based regularized method for continuous-time systems. This paper focuses on single-input-single-output, bounded-input-bounded-output, stable, linear time invariant and causal systems described by

[TABLE]

where $t\in\mathbb{R}_{0+}$ is the time index, $u(t)\in\mathbb{R},y(t)\in\mathbb{R},$ and $w(t)\in\mathbb{R}$ are the input, the measured output, and the measurement noise at time $t$ , respectively, $g(t):\mathbb{R}_{0+}\to\mathbb{R}$ is the impulse response of the system, and $\int_{0}^{t}u(t-\tau)g(\tau)d\tau$ is the convolution of the input and the impulse response, $w(t)$ is independently and identically Gaussian distributed with mean 0 and variance $\sigma^{2}$ . The identification problem in this paper is to estimate $g(t)$ from the measured output $\left\{y(t_{k})\right\}_{k=1}^{N}$ and the input $u(t)$ over the interval $[0,t_{N}]$ , where $t_{1},t_{2},\ldots,t_{N}$ are the sampling time instants.

To this end, we use the kernel-based regularization method where the estimated impulse response $\hat{g}(t)$ is given by

[TABLE]

Here, $\mathcal{H}$ is a Hilbert space of functions $g:\mathbb{R}_{0+}\to\mathbb{R}$ , and $\|\cdot\|_{\mathcal{H}}$ is the norm endowed to $\mathcal{H}$ , and $\gamma>0$ is a regularization parameter. Clearly, a good estimate of the impulse response depends on a good choice of $\mathcal{H}$ . In the sequel, we assume that $\mathcal{H}$ is a Reproducing Kernel Hilbert Space (RKHS).

The definitions of RKHS and the reproducing kernel are as follows. Let $\mathcal{X}$ be a nonempty set, and consider the Hilbert space of functions $f:\mathcal{X}\to\mathbb{R}$ denoted by $\mathcal{H}$ . Then $\mathcal{H}$ is a RKHS if

[TABLE]

Further let $\left\langle{\cdot},{\cdot}\right\rangle$ be the inner product endowed to $\mathcal{H}$ . Then a symmetric bivariate function $K:\mathcal{X}\times\mathcal{X}\to\mathbb{R}$ is the reproducing kernel of $\mathcal{H}$ if it satisfies

[TABLE]

where $K(x,\cdot)$ indicates the single-variable function defined by setting the first argument of $K$ to $x$ . Reproducing kernels are also called kernels for short. It is well-known that the kernel $K$ exists if the Hilbert space $\mathcal{H}$ is RKHS.

With the above definitions, the optimal solution of (2) has explicit expression. Let $K:\mathbb{R}_{0+}\times\mathbb{R}_{0+}\to\mathbb{R}$ be the kernel of $\mathcal{H}$ in (2). Also let $O\in\mathbb{R}^{N\times N}$ be a matrix which is defined as

[TABLE]

Let $y=[y(t_{1}),\ldots,y(t_{N})]^{\top}\in\mathbb{R}^{N}$ and $c\in\mathbb{R}^{N}$ be

[TABLE]

Then, the optimal solution of (2) is given by

[TABLE]

where $K_{i}^{u}(t)$ is a function of $t$ defined by

[TABLE]

See e.g., [9] for more details.

2.2 Problem statement

The Stable Spline (SS) kernel, the Tuned-Correlated kernel (sometimes also called the first-order stable spline kernel), and the DC kernel can all be derived by applying an exponential decay function as a coordinate change to different kinds of spline kernels. To make this point clear, we let $W(\cdot,\cdot)$ be a kernel function and $X(t)$ be a coordinate change function. Then the aforementioned three kernels can all be put into the following form

[TABLE]

Moreover, the coordinate change functions $X(t)$ are all $e^{-\alpha t}:\mathbb{R}_{0+}\to[0,1]$ for these three kernels, while the kernel $W$ is the second order spline kernel for the SS kernel, the first order spline kernel for the TC kernel, and a generalized first order spline kernel for the DC kernel, cf. [16]. In particular, the TC kernel,

[TABLE]

can be derived by applying $e^{-\alpha t}:\mathbb{R}_{0+}\to[0,1]$ as the coordinate change to the first-order spline kernel

[TABLE]

where $\beta>0$ and $\alpha>0$ are hyperparameters of the kernel. We consider more general coordinate changes in this paper.

Problem 1.

Let $G_{0}(s)$ be a stable and strictly proper transfer function and $g_{0}(t):\mathbb{R}_{0+}\to\mathbb{R}$ be the impulse response of $G_{0}(s)$ . Hereafter, we consider properties of the kernel given by the first-order spline kernel with $|g_{0}(t)|$ as the coordinate change function, i.e.,

[TABLE]

or equivalently the properties of the RKHS associated with $K_{G_{0}}$ that is denoted by $\mathcal{H}_{G_{0}}$ below.

3 POSITIVE DEFINITENESS AND STABILITY

We first recall some definitions.

A kernel $K:\mathcal{X}\times\mathcal{X}\to\mathbb{R}$ is said to be positive definite if the Gram matrix of $K$ defined as

[TABLE]

is positive semidefinite for any $[x_{1},\ldots,x_{m}]^{\top}\in\mathcal{X}^{m}$ and for any $m\in\mathbb{N}$ . The Moore-Aronszajin theorem states that if $K$ is positive definite, then there exists a unique RKHS whose reproducing kernel is $K$ [3].

A kernel $K:\mathbb{R}_{0+}\times\mathbb{R}_{0+}\to\mathbb{R}$ is said to be stable if $\mathcal{H}$ , the RKHS associated with $K$ , satisfies $\mathcal{H}\subset\mathcal{L}_{1}[0,\infty]$ .

Then we have the following result111All proofs of propositions are deferred to the Appendix..

Theorem 2.

The kernel (12) is positive definite and moreover, stable, i.e., the corresponding RKHS $\mathcal{H}_{G_{0}}$ is a subspace of $\mathcal{L}_{1}[0,\infty]$ .

Theorem 2 shows that the kernel (12) is a positive semidefinite kernel and moreover, for any $g\in\mathcal{H}_{G_{0}}$ , $g\in\mathcal{L}_{1}[0,\infty]$ .

4 ZERO-CROSSING RELATED PROPERTIES

The following proposition shows that if $g_{0}(t)$ has a zero-crossing, then any $g\in\mathcal{H}_{G_{0}}$ inherits this zero-crossing.

Proposition 3.

Assume that $g_{0}(t)$ satisfies $g_{0}(\tau)=0$ for some $\tau\in\mathbb{R}_{0+}$ . Then, $g(\tau)=0$ for any $g\in\mathcal{H}_{G_{0}}$ .

This proposition suggests that, if one knows that the true impulse response is zero at some time instant $\tau$ , then one should design $G_{0}(s)$ such that $g_{0}(\tau)=0$ . A typical case is $\tau=0$ , i.e., the relative degree of the system is known to be higher than or equal to two. For this case, the result can be further strengthened and is shown in the following theorem.

Theorem 4.

Assume that the identification input $u(t)$ is $k-1$ times differentiable, and satisfies

[TABLE]

If $g_{0}(t)$ satisfies $\lim_{t\to+0}\frac{d^{j}}{dt^{j}}g_{0}(t)=0$ for $j=0,1,\ldots,k$ , then $\lim_{t\to+0}\frac{d^{j}}{dt^{j}}{\hat{g}}(t)=0$ for $j=0,1,\ldots,k$ .

Moreover, for any $g\in\mathcal{H}_{G_{0}}$ , how fast $g(t)$ converges to 0 also depends on $g_{0}(t)$ , which is stated in the following theorem.

Theorem 5.

Assume that the input $u(t)$ is bounded, and let $U\in\mathbb{R}^{N}$ be a vector whose $i$ -th element is $\int_{0}^{t_{i}}u(t_{i}-\tau)d\tau$ . When $G_{0}(s)$ is stable and $G_{0}(s)\neq 0$ , $\frac{\hat{g}(t)}{|g_{0}(t)|}$ converges to $U^{\top}c$ when $t\to\infty$ , where $c$ is defined as (6).

In summary, $\mathcal{H}_{G_{0}}$ inherits some properties of $g_{0}$ , i.e., how $g_{0}$ crosses or converges to zero. This is because the linear spline kernel in (12) is employed. More specifically, let $\mathcal{H}_{S}$ be the RKHS associated with the first order spline kernel (11). Noting that $f(0)=0$ for any $f\in\mathcal{H}_{S}$ from the reproducing property, the properties of $H_{G_{0}}$ given in this section can be derived accordingly.

5 MAXIMUM ENTROPY PROPERTY

Interestingly, the kernel (12) also inherits the maximum entropy property of the linear spline kernel (11).

Theorem 6.

For a given $g_{0}(t)$ with $t_{0},\ldots,t_{n}$ be a sequence from $\mathbb{R}_{0+}\cup\{\infty\}$ , let $T_{0},\ldots,T_{n}$ be the permutation of $\{t_{0},\ldots,t_{n}\}$ such that

[TABLE]

and consider the stochastic process defined by

[TABLE]

where $w(k)$ is a white Gaussian noise with unit variance. Then, $h^{o}(T_{k})$ is a Gaussian process with zero mean and $K_{G_{0}}(T_{i},T_{j})$ as its covariance function. In addition, let $h(T)$ be any stochastic process defined over $\{T_{0},\ldots,T_{n}\}$ with $h(T_{0})=0$ . Then, the Gaussian process $h^{o}$ is the solution of the MaxEnt problem

[TABLE]

where $H(h(T_{0}),\ldots,h(T_{n}))$ denotes the differential entropy of $[h(T_{0}),\ldots,h(T_{n})]^{\top}$ 222The differential entropy of random variable $x$ is defined by $-\int p(x)\log p(x)dx$ , where the integral is taken over the support of $p(x)$ . .

This Maximum Entropy interpretation also suggests the special structure of the inverse of the Gram matrix of $K_{G_{0}}$ . For a given $g_{0}(t)$ with $t_{0},\ldots,t_{n}$ be a sequence from $\mathbb{R}_{0+}\cup\{\infty\}$ , let $T_{0},\ldots,T_{n}$ be a permutation of $t_{0},\ldots,t_{n}$ , which satisfies (15). Also let ${\bar{{\bm{K}}}\in\mathbb{R}^{(n+1)\times(n+1)}}$ be a Gram matrix of $K_{G_{0}}$ defined as

[TABLE]

Since $|g_{0}(T_{0})|$ is assumed to be zero, $\bar{\bm{K}}$ has a row and a column whose all elements are zero. We define $\bm{K}\in\mathbb{R}^{n\times n}$ as a matrix constructed by removing such a row and column from $\bar{\bm{K}}$ . Note that $\bm{K}$ is also a Gram matrix of $K_{G_{0}}$ .

Before showing the structure of $\bm{K}^{-1}$ , we first give the following result.

Theorem 7.

The determinant of $\bm{K}$ is given as

[TABLE]

Theorem 7 gives the condition where the inverse of $\bm{K}$ exists; $g_{0}(t_{i})\neq 0$ for all $i$ and $|g_{0}(t_{i})|\neq|g_{0}(t_{j})|$ for all $i\neq j$ .

Theorem 8.

Let $\bar{g}=[|g_{0}(t_{0})|,\ldots,|g_{0}(t_{n})|]^{\top}\in\mathbb{R}^{n+1}$ , and also let $g\in\mathbb{R}^{n}$ be a vector which removes the element corresponds to $|g_{0}(T_{0})|$ from $\bar{g}$ . Let $R\in\mathbb{R}^{n\times n}$ be a row-permutation matrix such that

[TABLE]

Then the inverse matrix of $\bm{K}$ is given as

[TABLE]

where $P\in\mathbb{R}^{n\times n}$ is the inverse matrix of the Gram matrix of the first-order spline kernel [14],

[TABLE]

Theorem 8 gives the explicit form of the inverse matrix of $\bm{K}$ . Note that $P$ is a tri-diagonal matrix. This theorem indicates that $\bm{K}^{-1}$ has a sparse structure, i.e., it has at most three elements in each row (or column).

Example 5.1.

For illustration, we consider the case $g_{0}(t)=te^{-t}$ , or equivalently, $G_{0}(s)=\frac{1}{(s+1)^{2}}$ , and show that the corresponding $\bm{K}^{-1}$ has a sparse structure. We set $t_{i}=0.1\times i\ (i=1,\ldots,40)$ , and computed $\bm{K}^{-1}$ according to Theorem 8.

Figs. 1 and 1 show the sparsity patterns of $\bm{K}^{-1}$ and $P$ , respectively, by using the matlab command spy. The horizontal and vertical axes show the column and row of each matrix, respectively, and the dots show the non-zero elements. We can see that $P$ is tri-diagonal, and $\bm{K}^{-1}$ has at most three non-zero elements in each row or column. The sparsity pattern may not be seen in a numerically computed $\bm{K}^{-1}$ , e.g., the one computed by using matlab command inv. For instance, spy(inv( ${\bm{K}}$ )) shows that all elements in inv( ${\bm{K}}$ ) are non-zero. To illustrate the effectiveness of Theorem 8, we compute $\left\|2\times I_{100}-\bm{K}\left(\bm{K}^{-1}\right)^{\prime}-\left(\bm{K}^{-1}\right)^{\prime}\bm{K}\right\|_{\rm FRO}$ , where $\left(\bm{K}^{-1}\right)^{\prime}$ shows a numerically computed inverse of $\bm{K}$ with Theorem 8 or inv. Then we have $1.4\times 10^{-12}$ with Theorem 8 and $1.6\times 10^{-12}$ with inv, respectively.

6 SPECTRAL ANALYSIS OF MULTIPLE POLE SPLINE KERNEL

It is well-known from Mercer’s Theorem that under suitable assumptions on the kernel any function in the RKHS can be represented by an orthonormal series. We show such an orthonormal basis for $\mathcal{H}_{G_{0}}$ , which can yield a reasonable finite dimensional approximation of $\mathcal{H}_{G_{0}}$ and can make some computations easy and fast. In this section, we focus on (12) where $G_{0}(s)=\frac{1}{(s+\alpha)^{n+1}}$ with $n=1,2,\ldots$ , and show the spectral analysis of (12). This kernel is proposed in [15] and called the Multiple pole Spline kernel.

6.1 Preliminary

We first introduce some definitions for a positive semidefinite kernel $K:\mathcal{X}\times\mathcal{X}\to\mathbb{R}$ with a compact set $\mathcal{X}$ .

Let $\mu$ be a nondegenerate Borel measure on $\mathcal{X}$ . Also let $L_{2}(\mathcal{X},\mu)$ denote the space of functions of $f:\mathcal{X}\to\mathbb{R}$ such that $\int_{\mathcal{X}}|f(x)|^{2}d\mu(x)<+\infty$ . For a given kernel $K$ and $\phi\in L_{2}(\mathcal{X},\mu)$ , we define an integral operator on $L_{2}(\mathcal{X},\mu)$ :

[TABLE]

If for some $\lambda$ ,

[TABLE]

has the solution other than $\phi(x)=0$ , $\lambda$ and the solution are called the eigenvalues and eigenfunctions of $L_{K}$ , respectively. Two distinct eigenfunctions $\phi(x)$ and $\psi(x)$ are orthogonal, i.e., $\langle\phi,\psi\rangle_{L_{2}(\mathcal{X},\mu)}=0$ . Then, the kernel $K$ has a series expansion

[TABLE]

which converges uniformly and absolutely on $\mathcal{X}\times\mathcal{X}$ .

Consider the first-order spline kernel $K_{S}(x,x^{\prime})=\min(x,x^{\prime}):[0,1]\times[0,1]\to\mathbb{R}$ with $\mu$ being the Lebesgue measure. In this case, the eigenvalues and eigenfunctions are given by

[TABLE]

With these $\lambda_{i}$ and $\phi_{i}$ , the spline kernel has the series expansion (26).

6.2 Main result

We consider the case $G_{0}(s)=\frac{\kappa}{(s+\alpha)^{n+1}}$ , $n=1,2,\ldots$ , i.e.,

[TABLE]

For the simplicity of notations and discussions, we take $\kappa=1$ in the following. The extension to other $\kappa\in\mathbb{R}$ is straightforward. In the rest of this section, $\lambda_{i}$ and $\phi_{i}$ denote the values and functions defined in (27).

Before showing the main result, we first show a lemma.

Lemma 9.

Let $T>0$ , and $x\in[0,T]$ . Then, $\lambda_{i}$ and $\phi_{i}$ defined by (27) satisfy

[TABLE]

In addition,

[TABLE]

Lemma 9 gives the eigenvalues and eigenfunctions of $\min(x,x^{\prime})$ over $(x,x^{\prime})\in[0,T]\times[0,T]$ for $T>0$ . In particular, $\frac{1}{\sqrt{T}}\phi_{i}(x^{\prime}/T)$ are orthonormal eigenfunctions.

The main result of this section is stated as follows.

Theorem 10.

Let $m:\mathbb{R}_{0+}\to\mathbb{R}_{0+}$ be a function defined by

[TABLE]

and consider the measure induced by $m$ ; $dm=\frac{dm}{d\tau}d\tau$ with the Lebesgue measure $d\tau$ . Also let $\lambda_{n,i}$ and $\phi_{n,i}$ be

[TABLE]

Then, we have

[TABLE]

with

[TABLE]

Theorem 10 suggests that $\lambda_{n,i}$ and $\phi_{n,i}$ are the eigenvalues and eigenfunctions of $K_{G_{0}}$ with the measure induced by $dm$ , respectively.

Based on Theorem 10, we have the following theorem.

Theorem 11.

Let $G_{0}(s)=\frac{1}{(s+\alpha)^{n+1}}$ .

the series expansion

[TABLE]

converges uniformly and absolutely on $\mathbb{R}_{0+}\times\mathbb{R}_{0+}$ . 2. 2.

$\left\{\sqrt{\lambda_{n,i}}\phi_{n,i}\right\}_{i=1}^{\infty}$ * forms an orthonormal basis of $\mathcal{H}_{G_{0}}$ , and $\mathcal{H}_{G_{0}}$ has an equivalent representation;*

[TABLE]

Moreover, the norm of $f$ is given by

[TABLE]

Example 6.1.

For illustration, we show the case with $n=1$ and $\alpha=1$ .

Figs. 2 and 3 shows $m(\tau)$ defined by (32) and $\frac{dm}{d\tau}$ , respectively. The horizontal axes show $\tau$ , and the vertical axes show $m(\tau)$ and $\frac{dm}{d\tau}$ , respectively. In this case, $\frac{n}{\alpha}=1$ and $\frac{dm}{d\tau}=0$ at $\tau=1$ .

Fig. 4 shows $\phi_{1,i}(\tau)$ for $i=1,2,3$ . The horizontal axes show $\tau$ , and the vertical axes show $\phi_{1,i}(\tau)$ . The top, middle, and bottom figures show $\phi_{1,1}(\tau)$ , $\phi_{1,2}(\tau)$ , and $\phi_{1,3}(\tau)$ . These eigenfunctions satisfy $\phi_{n,i}(0)=0$ and $\lim_{\tau\to\infty}\phi_{n,i}(\tau)=0$ as we expected.

With the same $n$ and $\alpha$ , we also compute $\|\bm{K}-\bm{K}_{M}\|_{\rm FRO}$ where $(i,j)$ elements of $\bm{K}$ and $\bm{K}_{M}$ are given as $K_{G_{0}}(t_{i},t_{j})$ and $\sum_{\ell=1}^{M}\lambda_{n,\ell}\phi_{n,\ell}(t_{i})\phi_{n,\ell}(t_{j})$ , respectively, with $t_{i}=0.1\times i\ (i=1,\ldots,40)$ . Fig. 5 illustrates how $\|\bm{K}-\bm{K}_{M}\|_{\rm FRO}$ converges to zero with increasing $M$ . The horizontal and vertical axes show $M$ and $\|\bm{K}-\bm{K}_{M}\|_{\rm FRO}$ , respectively.

7 ILLUSTRATIVE EXAMPLE

In Sec. 7, we give a numerical example to illustrate the effectiveness of the proposed kernel. The target system is given by

[TABLE]

hence the relative degree of the target is two. For $G_{0}(s)$ , we employ

[TABLE]

with $\theta=[\theta_{1},\theta_{2},\theta_{3}]^{\top}\in\mathbb{R}^{3}$ as the hyperparameters of the kernel. The impulse response of $G_{0}(s)$ is

[TABLE]

thus $g_{0}(0)=0$ for any $\theta$ . As shown in Sec. 4, this makes the estimated impulse response $\hat{g}(0)=0$ . This means that we enjoy a priori knowledge on the system that its relative degree is higher or equal to two.

We consider the case where the input is the impulsive input, and the noise variance $\sigma^{2}=10^{-4}$ . The sampling period $T_{s}$ is set to 0.1 [s], and we collect $\{y(T_{s}),y(2T_{s}),\ldots,y(100T_{s})\}=\{y(kT_{s})\}_{k=1}^{100}$ .

Fig. 6 shows an example of such observed data $\{y(kT_{s})\}_{k=1}^{100}$ . The horizontal axis shows time, and the vertical axis shows the observed output. Each dot shows the observed data $(iT_{s},y(iT_{s}))$ . In the following, we identify the impulse response from such data for 300 times with independent noise realizations.

We employ the Empirical Bayes method to tune the hyperparameters, i.e., $\theta$ is tuned so as to maximize

[TABLE]

where $\gamma$ is set to $\sigma^{2}$ . Note that $O$ depends on the hyperparameter $\theta$ . This is based on the Gaussian process interpretation of the kernel based regularization methods. In this interpretation, the kernel is regarded as the covariance function of the zero-mean Gaussian process, and (42) shows the logarithm of marginal likelihood (some constants are ignored). Such a tuning is called the Empirical Bayes [9].

Using $G_{0}(s)$ defined by (40) and the Empirical Bayes method, we perform the identification with $K_{G_{0}}$ for 300 times with independent noise realizations. Fig. 7 shows the estimated and true impulse response of the target system. The horizontal axis shows time, and the vertical axis shows the impulse response. The gray lines are 300 estimated impulse responses, and the red line shows the true impulse response. Apparently, the behavior of the original impulse response is well approximated with $K_{G_{0}}$ .

For comparison, we also show the result with the TC kernel and the Empirical Bayes. Recall that the TC kernel is defined as (10). Fig. 8 shows the 100 estimated impulse responses with the TC kernel and the Empirical Bayes. The estimated impulse responses converge to zero slowly, and show overfitting behavior.

For comparison, we also show the results with oracle hyperparameters, i.e., hyperparameters tuned with the true impulse response. Let $\hat{g}=[\hat{g}(T_{s}),\hat{g}(2T_{s}),\ldots,\hat{g}(100T_{s})]^{\top}\in\mathbb{R}^{100}$ and $g^{*}=[g^{*}(T_{s}),g^{*}(2T_{s}),\ldots,g^{*}(100T_{s})]^{\top}\in\mathbb{R}^{100}$ . Noting that we consider the case with impulsive input, we have

[TABLE]

where $\bm{K}\in\mathbb{R}^{100\times 100}$ is a Gram matrix of the kernel with $t_{i}=iT_{s}$ and $w=[w(T_{s}),w(2T_{s}),\ldots,w(100T_{s})]^{\top}\in\mathbb{R}^{100}$ . Then,

[TABLE]

and the mean square error on the sampled instants $t_{i}=iT_{s}\ (i=1,\ldots,100)$ becomes

[TABLE]

In the following, we show the results with hyperparameters which minimize (45).

Figs. 9 and 10 show the 300 estimated impulse responses with such hyperparameters. Figs. 9 and 10 employ the proposed and TC kernel, respectively. In this case, the estimated impulse response with the TC kernel converges to zero smoothly.

Fig. 11 shows the boxplots of the square errors on the sampled instants, i.e., $(\hat{g}-g)^{\top}(\hat{g}-g)$ , with 300 independent noise realizations. The left two boxes show the results with the Empirical Bayes, and the right two boxes show the results with the hyperparameter tuned according to the mean square error on the sampled instants. The proposed kernel with the Empirical Bayes shows almost the same performance as the TC with the oracle hyperparameter, and the proposed kernel with the oracle hyperparameter outperforms the others. These results show that the proposed kernel is more appropriate for $G^{*}(s)$ than the TC kernel.

As a statistical analysis, we perform the Wilcoxon rank sum tests for two cases. In the first case, we focus on the proposed kernel with the Empirical Bayes and the TC kernel with the oracle hyperparameter. The null hypothesis is that two medians of the square errors on the sampled instants are the same (two-sided rank sum test). The $p$ -value is 0.37, thus this null hypothesis can not be rejected. This implies that the proposed method with the Empirical Bayes performs as well as the TC kernel with the optimal hyperparameter. In the second case, we focus on the proposed and the TC kernel with the oracle hyperparameters. The null hypothesis is that the median of the square errors become smaller with the TC kernel (one-sided rank sum test). The $p$ -value is $2.0\times 10^{-4}$ , thus the alternative hypothesis is highly significant. This suggests that the proposed kernel has potential to achieve better estimate than the TC kernel.

From the above results, it is confirmed that the prposed kernel (12) can be useful for regularized impulse resopnse estimation, provided that the coordinate change is designed by taking into account the a priori knowledge on the system to be identified.

8 CONCLUSION

This paper focuses on kernels derived by appling coordinate changes induced by stable and strictly proper transfer functions to the first-order spline kernel. They are generalizations of the tuned-correlated kernel, which is one of the most widely used kernels in the regularized impulse response estimation. It is shown that the proposed kernels inherit properties from the coordinate changes such as the relative degree and the convergence rate. Also they inherit the Maximum Entropy property from the first-order spline kernel. Spectral analysis is given for the case where the coordinate change is chosen as $t^{n}e^{-\alpha t}$ . Numerical lexample is given to demonstrate the effectiveness of the proposed kernel and shows that a suitable coordinate change could give better performance than the tuned-correlated kernel.

Extension to cases for the second-order spline kernel or the generalized spline kernel are future tasks. Another future task is to find the optimal coordinate change in some sense for given a priori knowledge on the system to be identified.

Appendix A Proofs

A.1 Proof of Theorem 2

$K_{G_{0}}$ is interpreted as the first-order spline kernel with $\beta=\max_{t}|g_{0}(t)|$ and the coordinate change $\frac{|g_{0}(t)|}{\max_{t}|g_{0}(t)|}:\mathbb{R}_{0+}\to[0,1]$ . This suggests that $K_{G_{0}}$ is positive definite, hence there exists an RKHS associated with $K_{G_{0}}$ .

We recall the following proposition for the proof about the stability; if the kernel $K$ is a nonnegative valued function, i.e., $K:\mathbb{R}_{0+}\times\mathbb{R}_{0+}\to\mathbb{R}_{0+}$ , then $K$ is stable if and only if

[TABLE]

See Proposition 15 in [9] for more detail about the stability of the kernel.

The proof about the stability is based on the following Lemma.

Lemma 12.

For any stable and strictly proper rational transfer function $G_{0}(s)$ , there exists $\beta_{*}>0$ and $\alpha_{*}>0$ which satisfies

[TABLE]

The proof of Lemma 12 is given in Appendix A.2. Based on Lemma 12,

[TABLE]

Since $K_{G_{0}}$ is a nonnegative valued kernel and satisfies (46), the statement is proven.

A.2 Proof of Lemma 12

From the assumption that $G_{0}(s)$ is stable and a strictly proper rational function of $s$ , $g_{0}(t)$ is divided into four parts; derived from single-real poles, single-complex poles, repeated real poles, and repeated complex poles. In summary, we have

[TABLE]

where $N_{\rm real},N_{\rm comp},M_{\rm real}$ , and $M_{\rm comp}$ denote the number of distinct real poles, the number of distinct complex poles, the largest multiplicity of the real poles, and the largest multiplicity of the complex poles, respectively. $-\alpha^{\rm real}_{i}\in\mathbb{R},(i=1,\ldots,N_{\rm real})$ and and $-\alpha^{\rm comp}_{i}\pm\omega_{i}\mathrm{i},(i=1,\ldots,N_{\rm comp})$ show the distinct real poles and complex poles, respectively. Note that $\alpha_{i}^{\rm real}>0$ and $\alpha_{i}^{\rm comp}>0$ from the stability assumption. In the following, we show that each term of (49) is bounded by an exponential.

For the ease notations, we employ $\alpha$ instead of $\alpha^{\rm real}_{i}$ for a while. We show that $t^{j}e^{-\alpha t}(i\geq 1)$ is bounded by $j!\left(\frac{2}{\alpha}\right)^{j}e^{-\frac{\alpha}{2}t}$ , where $j!$ denotes the factorial of $j$ , i.e., $j!=j\times(j-1)\times(j-2)\times\cdots\times 2\times 1$ . For $\forall t\in\mathbb{R}_{0+}$ ,

[TABLE]

holds. The second equality is derived from the Taylor expansion of the exponential function, and the last inequality is derived from $\alpha>0,e^{-\alpha t}>0$ and $t\geq 0$ . From this inequality, we have

[TABLE]

where $c_{*}^{i,j}=j!\left(\frac{2}{\alpha_{i}^{\rm real}}\right)^{j}$ . Let $\alpha_{*}^{\rm real}=\min_{i}(\frac{1}{2}\alpha_{i}^{\rm real})$ . Then, $e^{-\frac{1}{2}\alpha_{i}^{\rm real}t}\leq e^{-\alpha_{*}^{\rm real}t}$ for $t\in\mathbb{R}_{0+}$ and we have

[TABLE]

with

[TABLE]

By noting

[TABLE]

the same proof can be applied for the second term of (49), and

[TABLE]

with

[TABLE]

From the above discussions, we have

[TABLE]

where

[TABLE]

and this completes the proof.

A.3 Proof of Proposition 3

From the reproducing property of $K_{G_{0}}$ ,

[TABLE]

Here we use $K_{G_{0}}(\tau,t)=\min(0,|g_{0}(t)|)=0$ .

A.4 Proof of Theorem 4

We first prove the case where $k=0$ . Consider $K_{i}^{u}(t)$ defined by (8). From the assumption that $g_{0}(t)\to 0$ when $t\to+0$ , $K_{i}^{u}(t)$ is rewritten as

[TABLE]

for sufficiently small $t$ . By noting $|\int_{0}^{t_{i}}u(t_{i}-\tau)d\tau|<\infty$ , we have $\lim_{t\to+0}K_{i}^{u}(t)=0$ from

[TABLE]

This holds for all $i$ , and we conclude $\lim_{t\to 0}\hat{g}(t)\to 0$ .

Next, we consider the case $k=1$ . From (60), we have

[TABLE]

Again by noting that $u(t)$ is bounded and $\frac{dg_{0}}{dt}\to 0$ from the assumption, we have $\lim_{t\to+0}\frac{d}{dt}K_{i}^{u}=0$ and $\frac{d\hat{g}}{dt}\to 0$ .

Finally, we prove the case where $k\geq 2$ . Let $U_{i}(t)=\int_{t}^{t_{i}}u(t_{i}-\tau)d\tau$ . When $k\geq 2$ , we have

[TABLE]

From the assumption that $u(t)$ and its derivatives are bounded, the derivatives of $U_{i}(t)$ are also bounded for $j=0,1,\ldots,k$ . Thus, if $\frac{d^{j}g_{0}}{dt^{j}}\to 0$ for all $j=0,\ldots,k$ , we have $\lim_{t\to+0}\frac{d^{k}}{dt^{k}}K_{i}^{u}(t)=0$ and the proof has been completed.

A.5 Proof of Theorem 5

Consider $K_{i}^{u}(t)$ defined by (8). Let $\mathcal{T}_{1,i}(t)\subset[0,t_{i}]$ and $\mathcal{T}_{2,i}(t)\subset[0,t_{i}]$ be sets defined by $\mathcal{T}_{1,i}(t)=\{\tau\mid|g_{0}(t)|\leq|g_{0}(\tau)|,0\leq\tau\leq t_{i}\}$ and $\mathcal{T}_{2,i}(t)=\{\tau\mid|g_{0}(t)|\geq|g_{0}(\tau)|,0\leq\tau\leq t_{i}\}$ . This indicates that $K_{G_{0}}(t,\tau)=|g_{0}(t)|$ when $\tau\in\mathcal{T}_{1,i}(t)$ and $K_{G_{0}}(t,\tau)=|g_{0}(\tau)|$ when $\tau\in\mathcal{T}_{2,i}(t)$ . Hence, we have

[TABLE]

Note that $0\leq\left|\frac{g(\tau)}{g(t)}\right|\leq 1$ when $\tau\in\mathcal{T}_{2,i}(t)$ . Since the integrand of the second term is bounded and the Lebesgue measure of $\mathcal{T}_{2,i}(t)$ goes to zero when $t\to\infty$ (because $g_{0}(t)\to 0$ ),

[TABLE]

and this indicates

[TABLE]

A.6 Proof of Theorem 2

The former half of the theorem is easily confirmed by the direct calculation;

[TABLE]

and by noting $|g_{0}(T_{\min(i,j)})|=\min(|g_{0}(T_{i})|,|g_{0}(T_{j})|)$ , $K_{G_{0}}$ is the covariance function of $h^{o}(T_{k})$ .

The latter half of the theorem is based on the Lemma 1 of [14], which is stated as follows.

Lemma 13 (Chen et al.).

Let $h(t)$ be any stochastic process with $h(t_{0})=0$ for $t_{0}=0$ . For any $n\in\mathbb{N}$ and $0=t_{0}\leq t_{1}\leq\cdots\leq t_{n}$ , the discrete-time Wiener process is the solution of the MaxEnt problem

[TABLE]

where the discrete-time Wiener process is given by

[TABLE]

Let $g^{\dagger}_{0}(t)$ be a function which maps $|g_{0}(T_{i})|$ to $T_{i}$ for $i=0,\ldots,n$ , i.e., $g_{0}^{\dagger}(|g_{0}(T_{i})|)=T_{i}$ . Also let $g_{i}$ and $h^{\prime}(g_{i})$ be $|g_{0}(T_{i})|$ and $h(g_{0}^{\dagger}(g_{i}))=h(T_{i})$ , respectively. With these notations, the original MaxEnt problem becomes

[TABLE]

From Lemma 1 of [14], the optimal solution of this MaxEnt problem is given by (70), and this completes the proof.

A.7 Proof of Theorems 7 and 8

We use the result in [14].

Proposition 14 (Chen et al.).

Consider the discrete-time Wiener kernel

[TABLE]

Under the assumption that $0\leq t_{1}\leq\cdots\leq t_{n}<\infty$ , the Gram matrix

[TABLE]

satisfies

[TABLE]

and

[TABLE]

By noting that $R\bm{K}R^{\top}$ is equivalent to $\bm{K}^{\rm Wiener}$ and $\left(\det(R)\right)^{2}=1$ , we have the results.

A.8 Proof of Lemma 9

With the transformation $X^{\prime}=x^{\prime}/T$ , we have

[TABLE]

A.9 Proof of Theorems 10 and 11

Divide the interval $[0,\infty)$ into $[0,\frac{n}{\alpha}]$ and $[\frac{n}{\alpha},\infty)$ . Note that $g_{0}(\tau)=\tau^{n}e^{-\alpha\tau}$ is monotonic on each interval from

[TABLE]

and $g_{0}(\tau)$ has the inverse function on each interval. In particular, the inverse function on $[0,\frac{n}{\alpha}]$ is given by $Z_{p}(y)=-\frac{n}{\alpha}W_{p}(-\frac{\alpha}{n}y^{\frac{1}{n}})$ where $W_{p}(x)$ denotes the principal branch of the Lambert W function (see Appendix B for a brief introduction of the Lambert W function). This is confirmed from the direct calculation;

[TABLE]

where $\exp(x)$ denotes $e^{x}$ . Similarly, the inverse function of $g_{0}(\tau)$ on the interval $[\frac{n}{\alpha},\infty)$ is given by $Z_{m}(y)=-\frac{n}{\alpha}W_{m}(-\frac{\alpha}{n}y^{\frac{1}{n}})$ where $W_{m}(x)$ denotes the minor branch of the Lambert W function. Note that $Z_{p}(y):[0,\left(\frac{n}{\alpha e}\right)^{n}]\to[0,\frac{n}{\alpha}]$ and $Z_{m}(y):[0,\left(\frac{n}{\alpha e}\right)^{n}]\to[\frac{n}{\alpha},\infty)$ satisfy $m(Z_{p}(y))=\frac{1}{2}y$ and $m(Z_{m}(y))=\left(\frac{n}{\alpha e}\right)^{n}-\frac{1}{2}y$ , respectively. This indicates $dm(Z_{p}(y))-dm(Z_{m}(y))=dy$ .

With these inverse relations, we change the integration variable from $\tau$ to $y=\tau^{n}e^{-\alpha\tau}$ .

[TABLE]

Here we use Lemma 9. The orthonormality of $\phi_{n,i}(\tau)$ is shown with the same integration variable change.

[TABLE]

The last equality is based on the orthonormality of $\phi_{i}$ over $[0,1]$ .

Theorem 11 is a direct consequence of Theorem 4 in page 37 of [17].

Appendix B The Lambert W function

This appendix gives a brief introduction of the Lambert W function. See e.g., [18] for more detail.

The Lambert W function is a set of functions which satisfies

[TABLE]

for any $z\in\mathbb{C}$ . If we restrict our attention to the case $z\in\mathbb{R}$ , the Lambert W function is divided into two branches; the principal branch and the minor branch.

Fig. 12 illustrates the Lambert W function on the real axis. The Lambert W function is double-valued on $-e^{-1}<z<0$ , and divided into two branches; $W(z)\geq-1$ and $W(z)\leq-1$ . The former one is called the principal branch, and the latter one is called the minor branch. We use notations $W_{p}(z)$ and $W_{m}(z)$ to denote the principal and the minor branch, respectively.

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Ljung. System Identification: Theory for the User . Prentice Hall, Upper Saddle River, NJ, 2nd edition edition, 1999.
2[2] G. Pillonetto and G. De Nicolao. A new kernel-based approach for linear system identification. Automatica , 46(1):81–93, 2010.
3[3] N. Aronszajn. Theory of Reproducing Kernels. Transactions of the American Mathematical Sociery , 68(3):337–404, 1950.
4[4] B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond . MIT press, 2001.
5[5] G. Prando and A. Chiuso. Model reduction for linear Bayesian System Identification. In Proceedings of IEEE 54th Conference on Decision and Control , pages 2121–2126, 2015.
6[6] T. Chen and L. Ljung. Regularized system identification using orthonormal basis functions. In Proceedings of 2015 European Control Conference , pages 1291–1296. IEEE, 2015.
7[7] T. Chen. On kernel design for regularized LTI system identification. Automatica , 90:109–122, 2018.
8[8] T. Chen, H. Ohlsson, and L. Ljung. On the estimation of transfer functions, regularizations and Gaussian processes–Revisited. Automatica , 48(8):1525–1535, 2012.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

On the Coordinate Change to the First-Order Spline Kernel for Regularized Impulse Response Estimation

Abstract

keywords:

1 INTRODUCTION

2 PROBLEM SETTING

2.1 Kernel-based regularization methods

2.2 Problem statement

Problem 1**.**

3 POSITIVE DEFINITENESS AND STABILITY

Theorem 2**.**

4 ZERO-CROSSING RELATED PROPERTIES

Proposition 3**.**

Theorem 4**.**

Theorem 5**.**

5 MAXIMUM ENTROPY PROPERTY

Theorem 6**.**

Theorem 7**.**

Theorem 8**.**

Example 5.1**.**

6 SPECTRAL ANALYSIS OF MULTIPLE POLE SPLINE KERNEL

6.1 Preliminary

6.2 Main result

Lemma 9**.**

Theorem 10**.**

Theorem 11**.**

Example 6.1**.**

7 ILLUSTRATIVE EXAMPLE

8 CONCLUSION

Appendix A Proofs

A.1 Proof of Theorem 2

Lemma 12**.**

A.2 Proof of Lemma 12

A.3 Proof of Proposition 3

A.4 Proof of Theorem 4

A.5 Proof of Theorem 5

A.6 Proof of Theorem 2

Lemma 13** (Chen et al.).**

A.7 Proof of Theorems 7 and 8

Proposition 14** (Chen et al.).**

A.8 Proof of Lemma 9

A.9 Proof of Theorems 10 and 11

Appendix B The Lambert W function

Problem 1.

Theorem 2.

Proposition 3.

Theorem 4.

Theorem 5.

Theorem 6.

Theorem 7.

Theorem 8.

Example 5.1.

Lemma 9.

Theorem 10.

Theorem 11.

Example 6.1.

Lemma 12.

Lemma 13 (Chen et al.).

Proposition 14 (Chen et al.).