Diagonally Square Root Integrable Kernels in System Identification

Mohammad Khosravi; Roy S. Smith

arXiv:2302.12929·eess.SY·February 28, 2023

Diagonally Square Root Integrable Kernels in System Identification

Mohammad Khosravi, Roy S. Smith

PDF

Open Access

TL;DR

This paper investigates diagonally square root integrable kernels within RKHS theory, showing their stability, integrability, and topological properties, with implications for system identification and Gaussian process stability.

Contribution

It introduces and analyzes the class of DSRI kernels, demonstrating their stability, integrability, and relevance to Gaussian process stability in system identification.

Findings

01

Various well-known kernels are DSRI.

02

DSRI kernels are stable and integrable.

03

Stability of Gaussian processes is characterized by DSRI kernels.

Abstract

In recent years, the reproducing kernel Hilbert space (RKHS) theory has played a crucial role in linear system identification. The core of a RKHS is the associated kernel characterizing its properties. Accordingly, this work studies the class of diagonally square root integrable (DSRI) kernels. We demonstrate that various well-known stable kernels introduced in system identification belong to this category. Moreover, it is shown that any DSRI kernel is also stable and integrable. We look into certain topological features of the RKHSs associated with DSRI kernels, particularly the continuity of linear operators defined on the respective RKHSs. For the stability of a Gaussian process centered at a stable impulse response, we show that the necessary and sufficient condition is the diagonally square root integrability of the corresponding kernel. Furthermore, we elaborate on this result by…

Equations130

{\mathscr{M}}(\mathds{k}):=\left\{\begin{array}[]{ll}\displaystyle\int_{{\mathbb{R}}_{+}}\mathds{k}(t,t)^{\frac{1}{2}}\ \mathrm{d}t,&\text{\quad when }{\mathbb{T}}={\mathbb{R}}_{+},\\ \displaystyle\sum_{t\in{\mathbb{Z}}_{+}}\mathds{k}(t,t)^{\frac{1}{2}},&\text{\quad when }{\mathbb{T}}={\mathbb{Z}}_{+}.\\ \end{array}\right.

{\mathscr{M}}(\mathds{k}):=\left\{\begin{array}[]{ll}\displaystyle\int_{{\mathbb{R}}_{+}}\mathds{k}(t,t)^{\frac{1}{2}}\ \mathrm{d}t,&\text{\quad when }{\mathbb{T}}={\mathbb{R}}_{+},\\ \displaystyle\sum_{t\in{\mathbb{Z}}_{+}}\mathds{k}(t,t)^{\frac{1}{2}},&\text{\quad when }{\mathbb{T}}={\mathbb{Z}}_{+}.\\ \end{array}\right.

\mathds k_{\scalebox 0.75 R n \scalebox 0.75 E} (s, t) = i = 1 \sum n λ_{i} α_{i}^{\frac{1}{2} (s + t)},

\mathds k_{\scalebox 0.75 R n \scalebox 0.75 E} (s, t) = i = 1 \sum n λ_{i} α_{i}^{\frac{1}{2} (s + t)},

\mathds k_{\scalebox 0.75 DI} (s, t) = \mathbbm 1_{{0}} (s - t) α^{s},

\mathds k_{\scalebox 0.75 DI} (s, t) = \mathbbm 1_{{0}} (s - t) α^{s},

\mathds k_{\scalebox 0.75 DC} (s, t) = α^{\frac{1}{2} (s + t)} γ^{∣ s - t ∣},

\mathds k_{\scalebox 0.75 TC} (s, t) = α^{m a x (s, t)},

\mathds k_{\SS} (s, t) = α^{m a x (s, t) + s + t} - \frac{1}{3} α^{3 m a x (s, t)},

\mathds k_{\scalebox 0.75 iTC} (s, t) = \frac{α ^{m a x (s, t) + 1} - β ^{m a x (s, t) + 1}}{max ( s , t ) + 1},

\mathds k_{\scalebox 0.75 iTC} (s, t) = \frac{α ^{m a x (s, t) + 1} - β ^{m a x (s, t) + 1}}{max ( s , t ) + 1},

\mathds k_{\scalebox 0.75 iSS} (s, t) = \frac{α ^{s + t + m a x (s, t) + 1} - β ^{s + t + m a x (s, t) + 1}}{s + t + max ( s , t ) + 1} - \frac{α ^{3 m a x (s, t) + 1} - β ^{3 m a x (s, t) + 1}}{9 max ( s , t ) + 3},

M (\mathds k_{v}) = ⎩ ⎨ ⎧ \int_{R_{+}} (v_{t}^{2})^{\frac{1}{2}} d t = \int_{R_{+}} ∣ v_{t} ∣ d t, t \in Z_{+} \sum (v_{t}^{2})^{\frac{1}{2}} = t \in Z_{+} \sum ∣ v_{t} ∣, if T = R_{+}, if T = Z_{+},

M (\mathds k_{v}) = ⎩ ⎨ ⎧ \int_{R_{+}} (v_{t}^{2})^{\frac{1}{2}} d t = \int_{R_{+}} ∣ v_{t} ∣ d t, t \in Z_{+} \sum (v_{t}^{2})^{\frac{1}{2}} = t \in Z_{+} \sum ∣ v_{t} ∣, if T = R_{+}, if T = Z_{+},

\mathds k_{\scalebox 0.75 st} (s + τ, t + τ) = \mathds k_{\scalebox 0.75 st} (s, t), \forall s, t, τ \in T .

\mathds k_{\scalebox 0.75 st} (s + τ, t + τ) = \mathds k_{\scalebox 0.75 st} (s, t), \forall s, t, τ \in T .

\mathds k_{\scalebox 0.75 AMLS} (s, t) = v_{t} \mathds k_{\scalebox 0.75 st} (s, t) v_{s}, \forall s, t \in T .

\mathds k_{\scalebox 0.75 AMLS} (s, t) = v_{t} \mathds k_{\scalebox 0.75 st} (s, t) v_{s}, \forall s, t \in T .

\!\!\!\left\{\begin{array}[]{ll}\displaystyle\int_{{\mathbb{R}}_{+}}\Big{|}\displaystyle\int_{{\mathbb{R}}_{+}}u_{s}\mathds{k}(t,s)\mathrm{d}s\Big{|}\mathrm{d}t<\infty,\!&\!\text{\quad when }{\mathbb{T}}={\mathbb{R}}_{+},\!\\ \displaystyle\sum_{t\in{\mathbb{Z}}_{+}}\Big{|}\displaystyle\sum_{s\in{\mathbb{Z}}_{+}}u_{s}\mathds{k}(t,s)\Big{|}<\infty,,\!&\!\text{\quad when }{\mathbb{T}}={\mathbb{Z}}_{+}.\!\\ \end{array}\right.

\!\!\!\left\{\begin{array}[]{ll}\displaystyle\int_{{\mathbb{R}}_{+}}\Big{|}\displaystyle\int_{{\mathbb{R}}_{+}}u_{s}\mathds{k}(t,s)\mathrm{d}s\Big{|}\mathrm{d}t<\infty,\!&\!\text{\quad when }{\mathbb{T}}={\mathbb{R}}_{+},\!\\ \displaystyle\sum_{t\in{\mathbb{Z}}_{+}}\Big{|}\displaystyle\sum_{s\in{\mathbb{Z}}_{+}}u_{s}\mathds{k}(t,s)\Big{|}<\infty,,\!&\!\text{\quad when }{\mathbb{T}}={\mathbb{Z}}_{+}.\!\\ \end{array}\right.

\!\!\!\left\{\begin{array}[]{ll}\displaystyle\int_{{\mathbb{R}}_{+}}\displaystyle\int_{{\mathbb{R}}_{+}}\big{|}\mathds{k}(t,s)\big{|}\mathrm{d}s\mathrm{d}t<\infty,\!&\!\text{\quad when }{\mathbb{T}}={\mathbb{R}}_{+},\!\\ \displaystyle\sum_{t\in{\mathbb{Z}}_{+}}\displaystyle\sum_{s\in{\mathbb{Z}}_{+}}\big{|}\mathds{k}(t,s)\big{|}<\infty,,\!&\!\text{\quad when }{\mathbb{T}}={\mathbb{Z}}_{+}.\!\\ \end{array}\right.

\!\!\!\left\{\begin{array}[]{ll}\displaystyle\int_{{\mathbb{R}}_{+}}\displaystyle\int_{{\mathbb{R}}_{+}}\big{|}\mathds{k}(t,s)\big{|}\mathrm{d}s\mathrm{d}t<\infty,\!&\!\text{\quad when }{\mathbb{T}}={\mathbb{R}}_{+},\!\\ \displaystyle\sum_{t\in{\mathbb{Z}}_{+}}\displaystyle\sum_{s\in{\mathbb{Z}}_{+}}\big{|}\mathds{k}(t,s)\big{|}<\infty,,\!&\!\text{\quad when }{\mathbb{T}}={\mathbb{Z}}_{+}.\!\\ \end{array}\right.

\left\{\begin{array}[]{ll}\displaystyle\sum\limits_{s\in{\mathbb{Z}}_{+}}\mathds{k}(s,s)<\infty,&\text{\quad if }{\mathbb{T}}={\mathbb{Z}}_{+},\\ \displaystyle\int_{{\mathbb{R}}_{+}}\ \mathds{k}(s,s)\mathrm{d}s<\infty,&\text{\quad if }{\mathbb{T}}={\mathbb{R}}_{+}.\\ \end{array}\right.

\left\{\begin{array}[]{ll}\displaystyle\sum\limits_{s\in{\mathbb{Z}}_{+}}\mathds{k}(s,s)<\infty,&\text{\quad if }{\mathbb{T}}={\mathbb{Z}}_{+},\\ \displaystyle\int_{{\mathbb{R}}_{+}}\ \mathds{k}(s,s)\mathrm{d}s<\infty,&\text{\quad if }{\mathbb{T}}={\mathbb{R}}_{+}.\\ \end{array}\right.

\left\{\begin{array}[]{ll}\displaystyle\sum_{s\in{\mathbb{Z}}_{+}}\displaystyle\sum_{t\in{\mathbb{Z}}_{+}}\mathds{k}(s,t)^{2}<\infty,&\text{\quad if }{\mathbb{T}}={\mathbb{Z}}_{+},\\ \displaystyle\int_{{\mathbb{R}}_{+}}\displaystyle\int_{{\mathbb{R}}_{+}}\ \mathds{k}(s,t)^{2}\mathrm{d}s\,\mathrm{d}t<\infty,&\text{\quad if }{\mathbb{T}}={\mathbb{R}}_{+}.\\ \end{array}\right.

\left\{\begin{array}[]{ll}\displaystyle\sum_{s\in{\mathbb{Z}}_{+}}\displaystyle\sum_{t\in{\mathbb{Z}}_{+}}\mathds{k}(s,t)^{2}<\infty,&\text{\quad if }{\mathbb{T}}={\mathbb{Z}}_{+},\\ \displaystyle\int_{{\mathbb{R}}_{+}}\displaystyle\int_{{\mathbb{R}}_{+}}\ \mathds{k}(s,t)^{2}\mathrm{d}s\,\mathrm{d}t<\infty,&\text{\quad if }{\mathbb{T}}={\mathbb{R}}_{+}.\\ \end{array}\right.

S_{\scalebox 0.75 DSRI} \subset S_{1} \subset S_{s} \subset S_{ft} \subset S_{2},

S_{\scalebox 0.75 DSRI} \subset S_{1} \subset S_{s} \subset S_{ft} \subset S_{2},

∥ L ∥_{L (H_{\mathbbm k}, B)} \leq ∥ L ∥_{L (L^{1}, B)} M (\mathds k) .

∥ L ∥_{L (H_{\mathbbm k}, B)} \leq ∥ L ∥_{L (L^{1}, B)} M (\mathds k) .

L (g) := ⎩ ⎨ ⎧ s \in Z_{+} \sum g_{t} v_{t}, \int_{R_{+}} g_{t} v_{t} d t, if T = Z_{+}, if T = R_{+},

L (g) := ⎩ ⎨ ⎧ s \in Z_{+} \sum g_{t} v_{t}, \int_{R_{+}} g_{t} v_{t} d t, if T = Z_{+}, if T = R_{+},

L_{t}^{u} (g) := ⎩ ⎨ ⎧ s \in Z_{+} \sum g_{s} u_{t - s}, \int_{R_{+}} g_{s} u_{t - s} d s, if T = Z_{+}, if T = R_{+},

L_{t}^{u} (g) := ⎩ ⎨ ⎧ s \in Z_{+} \sum g_{s} u_{t - s}, \int_{R_{+}} g_{s} u_{t - s} d s, if T = Z_{+}, if T = R_{+},

F_{ω}^{(r)} (g) := ⎩ ⎨ ⎧ t \in Z_{+} \sum g_{t} cos (ω t), \int_{R_{+}} g_{t} cos (ω t) d t, if T = Z_{+}, if T = R_{+},

F_{ω}^{(r)} (g) := ⎩ ⎨ ⎧ t \in Z_{+} \sum g_{t} cos (ω t), \int_{R_{+}} g_{t} cos (ω t) d t, if T = Z_{+}, if T = R_{+},

F_{ω}^{(i)} (g) := ⎩ ⎨ ⎧ - t \in Z_{+} \sum g_{t} sin (ω t), - \int_{R_{+}} g_{t} sin (ω t) d t, if T = Z_{+}, if T = R_{+},

F_{ω}^{(i)} (g) := ⎩ ⎨ ⎧ - t \in Z_{+} \sum g_{t} sin (ω t), - \int_{R_{+}} g_{t} sin (ω t) d t, if T = Z_{+}, if T = R_{+},

g : (T \times Ω, G_{T} \otimes G_{Ω}, μ \times P) \to R

g : (T \times Ω, G_{T} \otimes G_{Ω}, μ \times P) \to R

\big{[}g_{t_{1}},\ldots,g_{t_{n}}\big{]}^{{\scalebox{0.63}{$\mathsf{T}$ }}}\!\sim{\mathcal{N}}\Big{(}\big{[}m_{t_{i}}\big{]}_{i=1}^{n},\big{[}\mathds{k}(t_{i},t_{j})\big{]}_{i,j=1}^{n}\Big{)}.

\big{[}g_{t_{1}},\ldots,g_{t_{n}}\big{]}^{{\scalebox{0.63}{$\mathsf{T}$ }}}\!\sim{\mathcal{N}}\Big{(}\big{[}m_{t_{i}}\big{]}_{i=1}^{n},\big{[}\mathds{k}(t_{i},t_{j})\big{]}_{i,j=1}^{n}\Big{)}.

Φ (δ) = \frac{1}{( 2 π ) ^{\frac{1}{2}}} \int_{- δ}^{δ} e^{- \frac{1}{2} x^{2}} d x,

Φ (δ) = \frac{1}{( 2 π ) ^{\frac{1}{2}}} \int_{- δ}^{δ} e^{- \frac{1}{2} x^{2}} d x,

I_{t, ε} = [m_{t} - δ_{ε} \mathds k (t, t)^{\frac{1}{2}}, m_{t} + δ_{ε} \mathds k (t, t)^{\frac{1}{2}}],

I_{t, ε} = [m_{t} - δ_{ε} \mathds k (t, t)^{\frac{1}{2}}, m_{t} + δ_{ε} \mathds k (t, t)^{\frac{1}{2}}],

{\mathrm{s}}_{\varepsilon}^{+}:=\big{(}m_{t}+\delta_{\varepsilon}\mathds{k}(t,t)^{\frac{1}{2}}\big{)}_{t\in{\mathbb{T}}}\,,

{\mathrm{s}}_{\varepsilon}^{+}:=\big{(}m_{t}+\delta_{\varepsilon}\mathds{k}(t,t)^{\frac{1}{2}}\big{)}_{t\in{\mathbb{T}}}\,,

{\mathrm{s}}_{\varepsilon}^{-}:=\big{(}m_{t}-\delta_{\varepsilon}\mathds{k}(t,t)^{\frac{1}{2}}\big{)}_{t\in{\mathbb{T}}}\,.

{\mathrm{s}}_{\varepsilon}^{-}:=\big{(}m_{t}-\delta_{\varepsilon}\mathds{k}(t,t)^{\frac{1}{2}}\big{)}_{t\in{\mathbb{T}}}\,.

M (\mathds k) \leq C^{\frac{1}{2}} \int_{R_{+}} \mathds h (t, t)^{\frac{1}{2}} d t = C^{\frac{1}{2}} M (\mathds h) < \infty.

M (\mathds k) \leq C^{\frac{1}{2}} \int_{R_{+}} \mathds h (t, t)^{\frac{1}{2}} d t = C^{\frac{1}{2}} M (\mathds h) < \infty.

\begin{split}\sum_{j,k=1}^{m}a_{j}\mathds{k}_{\text{{\scalebox{0.75}{$\mathrm{R}$}}}n\text{{\scalebox{0.75}{$\mathrm{E}$}}}}(t_{j},t_{k})a_{k}&=\sum_{j,k=1}^{m}\sum_{i=1}^{n}a_{j}a_{k}\lambda_{i}\alpha_{i}^{\frac{1}{2}(t_{j}+t_{k})}\\ &=\sum_{i=1}^{n}\lambda_{i}\Big{(}\sum_{j=1}^{m}a_{j}\alpha_{i}^{\frac{1}{2}t_{j}}\Big{)}^{2}\geq 0,\end{split}

\begin{split}\sum_{j,k=1}^{m}a_{j}\mathds{k}_{\text{{\scalebox{0.75}{$\mathrm{R}$}}}n\text{{\scalebox{0.75}{$\mathrm{E}$}}}}(t_{j},t_{k})a_{k}&=\sum_{j,k=1}^{m}\sum_{i=1}^{n}a_{j}a_{k}\lambda_{i}\alpha_{i}^{\frac{1}{2}(t_{j}+t_{k})}\\ &=\sum_{i=1}^{n}\lambda_{i}\Big{(}\sum_{j=1}^{m}a_{j}\alpha_{i}^{\frac{1}{2}t_{j}}\Big{)}^{2}\geq 0,\end{split}

M (\mathds k_{\scalebox 0.75 R n \scalebox 0.75 E}) \leq ⎩ ⎨ ⎧ λ^{\frac{1}{2}} \int_{R_{+}} α^{\frac{1}{2} t} d t = - \frac{2 λ ^{\frac{1}{2}}}{ln ( α )}, λ^{\frac{1}{2}} t \in Z_{+} \sum α^{\frac{1}{2} t} = \frac{λ ^{\frac{1}{2}}}{1 - α ^{\frac{1}{2}}}, if T = R_{+}, if T = Z_{+},

M (\mathds k_{\scalebox 0.75 R n \scalebox 0.75 E}) \leq ⎩ ⎨ ⎧ λ^{\frac{1}{2}} \int_{R_{+}} α^{\frac{1}{2} t} d t = - \frac{2 λ ^{\frac{1}{2}}}{ln ( α )}, λ^{\frac{1}{2}} t \in Z_{+} \sum α^{\frac{1}{2} t} = \frac{λ ^{\frac{1}{2}}}{1 - α ^{\frac{1}{2}}}, if T = R_{+}, if T = Z_{+},

\big{(}\alpha\mathds{k}(t,t)+\beta\mathds{h}(t,t)\big{)}^{\frac{1}{2}}\leq\alpha^{\frac{1}{2}}\mathds{k}(t,t)^{\frac{1}{2}}+\beta^{\frac{1}{2}}\mathds{h}(t,t)^{\frac{1}{2}},

\big{(}\alpha\mathds{k}(t,t)+\beta\mathds{h}(t,t)\big{)}^{\frac{1}{2}}\leq\alpha^{\frac{1}{2}}\mathds{k}(t,t)^{\frac{1}{2}}+\beta^{\frac{1}{2}}\mathds{h}(t,t)^{\frac{1}{2}},

\big{(}\mathds{k}(t,t)\mathds{h}(t,t)\big{)}^{\frac{1}{2}}\leq\big{(}\sup_{t\in{\mathbb{T}}}\mathds{h}(t,t)\big{)}^{\frac{1}{2}}\mathds{k}(t,t)^{\frac{1}{2}},

\big{(}\mathds{k}(t,t)\mathds{h}(t,t)\big{)}^{\frac{1}{2}}\leq\big{(}\sup_{t\in{\mathbb{T}}}\mathds{h}(t,t)\big{)}^{\frac{1}{2}}\mathds{k}(t,t)^{\frac{1}{2}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsControl Systems and Identification · Target Tracking and Data Fusion in Sensor Networks · Image and Signal Denoising Methods

MethodsGaussian Process

Full text

Diagonally Square Root Integrable Kernels in System Identification

Mohammad Khosravi [email protected]

Roy S. Smith [email protected] Delft Center for Systems and Control, Delft University of Technology

Automatic Control Laboratory, ETH Zürich

Abstract

In recent years, the reproducing kernel Hilbert space (RKHS) theory has played a crucial role in linear system identification. The core of a RKHS is the associated kernel characterizing its properties. Accordingly, this work studies the class of diagonally square root integrable (DSRI) kernels. We demonstrate that various well-known stable kernels introduced in system identification belong to this category. Moreover, it is shown that any DSRI kernel is also stable and integrable. We look into certain topological features of the RKHSs associated with DSRI kernels, particularly the continuity of linear operators defined on the respective RKHSs. For the stability of a Gaussian process centered at a stable impulse response, we show that the necessary and sufficient condition is the diagonally square root integrability of the corresponding kernel. Furthermore, we elaborate on this result by providing proper interpretations.

keywords:

system identification; kernel-based methods; diagonally square root integrable kernels; stable Gaussian processes

††thanks: This paper was not presented at any IFAC meeting. Corresponding author M. Khosravi.

,

1 Introduction

The theory of reproducing kernel Hilbert spaces (RKHSs) was introduced [1] midway through the twentieth century. The intrinsic properties of RKHSs, their one-to-one relationship with the positive definite kernels, and their fundamental ties to the Gaussian processes offer a strong foundation for addressing various estimation and interpolation problems [2, 3, 4, 5, 6]. Accordingly, they have become increasingly prevalent in statistics, signal processing, learning theory, and numerical analysis [7, 8, 9, 10]. On the other hand, system identification has emerged as the theory and techniques for estimating suitable mathematical representations of dynamical systems using measurement data [11], and remained an active field of research by developing numerous methodologies [12, 13, 14, 15, 16].

The RKHS theory is brought to the system identification area in [17] by developing kernel-based system identification methods. As a result, a paradigm shift occurred in the system identification theory [18] by addressing issues of bias-variance trade-off, robustness, and model order selection [19, 20, 21], unifying the identification of continuous-time systems and discrete-time systems [19], and allowing the inclusion of various side-information forms in the identification problem [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. Furthermore, due to the inherent connection between RKHSs and Gaussian processes [3], kernel-based methods offer a Bayesian interpretation of the system identification problem that allows quantifying the uncertainty and provides statistical guarantees [34]. Over the past decade, research on kernel-based system identification methods has received considerable attention and progressed significantly; nonetheless, it is still an ongoing field of research with various open problems and state-of-the-art results [35, 36, 37, 38, 39, 40].

The building block of each RKHS is the associated kernel function. As a result, various attributes of the RKHS elements are inherited from the corresponding kernel. Therefore, it is necessary to introduce kernels suitable for system identification [41]. The most prevalent kernels in the literature include diagonal/correlated, tuned/correlated, stable spline, and their extensions, which are proposed primarily for the sake of impulse response stability and smoothness [42, 43, 44]. For improving the identification performance of complex systems, various ideas on designing kernels by combining multiple kernels are proposed [45, 46, 47, 48]. Influenced by machine learning, harmonic analysis of stochastic processes, linear system theory, and filter design techniques, further categories of kernels are developed [49, 50, 51]. The significance of kernels led to the investigation of their more generic aspects, e.g., the relation between the absolute summability of kernels and their stability is clarified in [39]. Moreover, the link between various categories of kernels is studied in [37], where the mathematical foundations of stable kernels and their RKHSs are explored. Furthermore, in [20], it is shown that the realizations of a zero-mean Gaussian process are almost surely stable impulse responses if the corresponding kernel is diagonally square root integrable (DSRI).

In this work, we revisit the definition and notion of DSRI111Throughout this paper, DSRI stands for both of “diagonally square root integrable” and “diagonally square root integrability”. kernels, which was initially introduced in [20]. Following this, we investigate the class of DSRI kernels by describing its structure as a partially ordered cone. We show that this kernel category includes a broad range of well-known kernels commonly used in system identification, e.g., diagonally/correlated, stable spline, amplitude-modulated locally stationary, and simulation-induced kernels. The structure of DSRI kernel class is further elaborated by revisiting the fact that they are stable and integrable. This way, we obtain inner and outer approximations for the class of DSRI kernels. Subsequently, we investigate fundamental topological features of RKHSs with DSRI kernels. Namely, it is shown that for linear operators defined on ${\mathscr{L}}^{1}$ , the space of stable impulse responses, the continuity property is inherited when the operator is restricted to a RKHS endowed with a DSRI kernel. For the stability of zero-mean Gaussian processes, we show that the sufficient condition introduced in [20] is also necessary. We further generalize this result and provide suitable interpretations. Due to the theoretical nature of the work and in an effort to further facilitate reading the manuscript, the burdensome technical arguments, such as proofs of theorems and lemmas, have been moved to the appendix. For the sake of completeness, the appendix provides all of the proofs, including the relatively simple ones.

2 Notation and Preliminaries

Throughout the paper, the set of natural numbers, the set of real numbers, the set of complex numbers, the set of non-negative integers, and the set of non-negative real numbers are denoted respectively by ${\mathbb{N}}$ , ${\mathbb{R}}$ , ${\mathbb{C}}$ , ${\mathbb{Z}}_{+}$ , and ${\mathbb{R}}_{+}$ . Moreover, ${\mathbb{T}}$ denotes the time index set, which corresponds to either to ${\mathbb{Z}}_{+}$ or ${\mathbb{R}}_{+}$ , and ${\mathbb{T}}_{\pm}$ is defined as ${\mathbb{T}}_{\pm}:={\mathbb{T}}\cup(-{\mathbb{T}})$ . The generic measure space in our discussion is $({\mathbb{T}},{\mathscr{G}}_{{\mathbb{T}}},\mu)$ , where ${\mathscr{G}}_{{\mathbb{T}}}$ and $\mu$ are respectively the $\sigma$ -algebra of Borel subsets of ${\mathbb{R}}_{+}$ and the Lebesgue measure, when ${\mathbb{T}}={\mathbb{R}}_{+}$ , and, ${\mathscr{G}}_{{\mathbb{T}}}$ and $\mu$ are respectively the set of subsets of ${\mathbb{Z}}_{+}$ and the counting measure, when ${\mathbb{T}}={\mathbb{Z}}_{+}$ . Accordingly, we additionally consider the measure space $({\mathbb{T}}\times{\mathbb{T}},{\mathscr{G}}_{{\mathbb{T}}}\otimes{\mathscr{G}}_{{\mathbb{T}}},\mu\times\mu)$ , where ${\mathscr{G}}_{{\mathbb{T}}}\otimes{\mathscr{G}}_{{\mathbb{T}}}$ and $\mu\times\mu$ are respectively the product $\sigma$ -algebra and product measure defined based on ${\mathscr{G}}_{{\mathbb{T}}}$ and $\mu$ . Furthermore, we assume ${\mathbb{R}}$ is endowed with Borel $\sigma$ -algebra ${\mathscr{B}}$ and Lebesgue measure. Given a measurable space $({\mathcal{X}},{\mathscr{F}})$ , the space of measurable functions ${\mathrm{v}}:{\mathcal{X}}\to{\mathbb{R}}$ is denoted by ${\mathbb{R}}^{{\mathcal{X}}}$ , and ${\mathrm{v}}\in{\mathbb{R}}^{{\mathcal{X}}}$ is shown entry-wise as ${\mathrm{v}}=(v_{x})_{x\in{\mathcal{X}}}$ , or ${\mathrm{v}}=(v(x))_{x\in{\mathcal{X}}}$ . Given ${\mathcal{Y}}\subset{\mathcal{X}}$ , the indicator function $\mathbbm{1}_{{\mathcal{Y}}}:{\mathcal{X}}\to\{0,1\}$ is defined as $\mathbbm{1}_{{\mathcal{Y}}}(x)=1$ , if $x\in{\mathcal{Y}}$ , and $\mathbbm{1}_{{\mathcal{Y}}}(x)=0$ , otherwise. Depending on the context, ${\mathscr{L}}^{\infty}$ denotes $\ell^{\infty}({\mathbb{Z}})$ or $L^{\infty}({\mathbb{R}})$ . Similarly, ${\mathscr{L}}^{1}$ refers to $\ell^{1}({\mathbb{Z}}_{+})$ or $L^{1}({\mathbb{R}}_{+})$ . For $p\in\{1,\infty\}$ , the norm in ${\mathscr{L}}^{p}$ is denoted by $\|\cdot\|_{p}$ . The norms defined on Banach spaces ${\mathscr{L}}^{1}$ and ${\mathscr{L}}^{\infty}$ are respectively denoted by $\|\cdot\|_{1}$ and $\|\cdot\|_{\infty}$ . The space of bounded linear operators from Banach space ${\mathbb{X}}$ to Banach space ${\mathbb{Y}}$ is a Banach space, denoted by ${\mathcal{L}}({\mathbb{X}},{\mathbb{Y}})$ and endowed with norm $\|\cdot\|_{{\mathcal{L}}({\mathbb{X}},{\mathbb{Y}})}$ [52].

3 Diagonally Square Root Integrable Kernels

In this section, the definition of diagonally square root integrable kernels is revisited. To this end, we need to recall the notion of Mercer kernels [5].

Definition 1 ([5]).

The symmetric measurable function $\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ is said to be a positive-definite kernel, or simply, kernel, when, for any $m\in{\mathbb{N}}$ , $s_{1},\ldots,s_{n}\in{\mathbb{T}}$ , and $a_{1},\ldots,a_{n}\in{\mathbb{R}}$ , we have $\sum_{i,j=1}^{m}a_{i}\mathds{k}(s_{i},s_{j})a_{j}\geq 0$ . For each $t\in{\mathbb{T}}$ , the function $\mathds{k}_{t}:{\mathbb{T}}\to{\mathbb{R}}$ , defined as $\mathds{k}_{t}(\cdot)=\mathds{k}(t,\cdot)$ , is called the section of kernel $\mathds{k}$ at $t$ .

The following definition introduces our main object of interest in this paper.

Definition 2.

The positive-definite kernel $\,\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ is said to be diagonally square root integrable (DSRI) if ${\mathscr{M}}(\mathds{k})<\infty$ , where ${\mathscr{M}}(\mathds{k})$ is defined as

[TABLE]

The class of DSRI kernels is denoted by ${\mathscr{S}}_{\text{{\scalebox{0.75}{$ \mathrm{DSRI} $}}}}$ .

For any $t\in{\mathbb{T}}$ , one should note that $\mathds{k}(t,t)\geq 0$ , which is implied by positive-definiteness property given in Definition 1. Consequently, the right-hand sides in (1) are well-defined for any positive-definite kernel, with possible values in ${\mathbb{R}}_{+}\cup\{+\infty\}$ . According to Definition 2, kernel $\mathds{k}$ is DSRI when this value is finite, i.e., ${\mathscr{M}}(\mathds{k})<\infty$ .

Given the definition of the DSRI kernels, it is natural to ask about the kernels satisfying this property and their particular features of interest. These questions will be addressed in the following sections.

4 Well-known DSRI Kernels

In this section, we study the class of DSRI kernels, ${\mathscr{S}}_{\text{{\scalebox{0.75}{$ \mathrm{DSRI} $}}}}$ , by showing that many well-known kernels in the system identification context belong to this category of kernels. To this end, we need the notion of (diagonal) dominancy, which introduces a partial order on the set of positive-definite kernels.

Definition 3.

Let $\mathds{k},\mathds{h}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ be positive-definite kernels. We say $\mathds{h}$ dominates $\mathds{k}$ if there exists $C\in{\mathbb{R}}_{+}$ such that $|\mathds{k}(s,t)|\leq C|\mathds{h}(s,t)|$ , for all $t,s\in{\mathbb{T}}$ . Similarly, it is said that $\mathds{h}$ diagonally dominates $\mathds{k}$ if the inequality holds when $s$ equals $t$ .

To elaborate on the importance of Definition 3 in describing ${\mathscr{S}}_{\text{{\scalebox{0.75}{$ \mathrm{DSRI} $}}}}$ , we need to introduce finite-rank exponential kernels. More precisely, given $n\in{\mathbb{N}}$ , $\bm{\lambda}=[\lambda_{1},\ldots,\lambda_{n}]^{{\scalebox{0.63}{$ \mathsf{T} $}}}\!\in{\mathbb{R}}_{+}^{n}$ , and $\bm{\alpha}=[\alpha_{1},\ldots,\alpha_{n}]^{{\scalebox{0.63}{$ \mathsf{T} $}}}\!\in[0,1)^{n}$ , the rank- $n$ exponential kernel $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{R} $}}}n\text{{\scalebox{0.75}{$ \mathrm{E} $}}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ is defined as

[TABLE]

for any $s,t\in{\mathbb{T}}$ . We denote the kernel by $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{R} $}}}n\text{{\scalebox{0.75}{$ \mathrm{E} $}}}}(\cdot,\cdot\,;\bm{\lambda},\bm{\alpha})$ , and write $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{R} $}}}n\text{{\scalebox{0.75}{$ \mathrm{E} $}}}}(s,t\,;\bm{\lambda},\bm{\alpha})$ on the left-hand side of (2), when we want to highlight the dependency on the hyperparameter vectors $\bm{\lambda}$ and $\bm{\alpha}$ .

Theorem 1.

i)* Let $\mathds{k},\mathds{h}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ be positive-definite kernels where $\mathds{h}$ is DSRI. If $\mathds{h}$ (diagonally) dominates $\mathds{k}$ , then $\mathds{k}$ is DSRI.

ii) The rank- $n$ exponential kernel $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{R} $}}}n\text{{\scalebox{0.75}{$ \mathrm{E} $}}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ defined in (2) is DSRI.*

Theorem 1 can be used to show that a variety of kernels belongs to ${\mathscr{S}}_{\text{{\scalebox{0.75}{$ \mathrm{DSRI} $}}}}$ . In the literature of system identification, various kernels are introduced [19, 53], e.g., diagonal, diagonally/correlated, tuned/correlated, and stable spline kernels, which are respectively denoted by $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{DI} $}}}}$ , $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{DC} $}}}}$ , $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{TC} $}}}}$ , and $\mathds{k}_{\SS}$ , and defined as

[TABLE]

for any $s,t\in{\mathbb{T}}$ , where $\alpha\in(0,1)$ , $\gamma\in(-1,1)$ , if ${\mathbb{T}}={\mathbb{Z}}_{+}$ , and, $\gamma\in(0,1)$ , if ${\mathbb{T}}={\mathbb{R}}_{+}$ . Moreover, in [54], the first and second order integral stable spline kernels are defined as

[TABLE]

for any $s,t\in{\mathbb{T}}$ , where $0\leq\beta\leq\alpha<1$ . We can directly calculate ${\mathscr{M}}(\mathds{k})$ using (1), for the above-mentioned kernels, and show that these kernels belong to ${\mathscr{S}}_{\text{{\scalebox{0.75}{$ \mathrm{DSRI} $}}}}$ . On the other hand, we can easily see that kernels $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{DI} $}}}}$ , $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{DC} $}}}}$ , $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{TC} $}}}}$ , and $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{iTC} $}}}}$ are dominated by $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{R} $}}}n\text{{\scalebox{0.75}{$ \mathrm{E} $}}}}(\cdot,\cdot\,;1,\alpha)$ . Similarly, we can show that the $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{R} $}}}n\text{{\scalebox{0.75}{$ \mathrm{E} $}}}}(\cdot,\cdot\,;1,\alpha^{3})$ dominates $\mathds{k}_{\SS}$ and $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{iSS} $}}}}$ . Thus, one can easily conclude from Theorem 1 that each of the above-mentioned kernels are DSRI. Based on the same line of argument, one can show the same result for the $n^{\text{\tiny{th}}}$ -order stable spline kernels [19] (see Appendix A.3 for more details).

Theorem 2.

*Let $\mathds{k},\mathds{h}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ be positive-definite kernels, where $\mathds{k}$ is DSRI.

i) If $\mathds{h}$ is DSRI, then $\alpha\mathds{k}+\beta\mathds{h}$ is a DSRI kernel, for any $\alpha,\beta\in{\mathbb{R}}_{+}$ .

ii) If $\sup_{t\in{\mathbb{T}}}\mathds{h}(t,t)<\infty$ , then $\mathds{k}\mathds{h}$ is a DSRI kernel.*

Theorem 1 and Theorem 2 characterize the structure of the class of DSRI kernels as a cone equipped with a partial order. Also, they can further be used to verify the DSRI property for other kernels. For example, consider kernel $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{iTS} $}}}}$ introduced in [54] as the combination of $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{iTC} $}}}}$ and $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{iSS} $}}}}$ , i.e., we have $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{iTS} $}}}}(s,t):=\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{iTC} $}}}}(s,t)+\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{iSS} $}}}}(s,t)$ , for any $s,t\in{\mathbb{T}}$ . Based on the above discussion and Theorem 2, one can easily see that $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{iTS} $}}}}$ is a DSRI kernel.

Let ${\mathrm{v}}:=(v_{t})_{t\in{\mathbb{T}}}\in{\mathscr{L}}^{1}$ and $\mathds{k}_{{\mathrm{v}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ be defined as $\mathds{k}_{{\mathrm{b}}}(s,t)=v_{s}v_{t}$ , for any $s,t\in{\mathbb{T}}$ [49]. One can easily see that $\mathds{k}_{{\mathrm{v}}}$ is a rank- $1$ positive-definite kernel with

[TABLE]

which says that ${\mathscr{M}}(\mathds{k}_{{\mathrm{v}}})=\|{\mathrm{v}}\|_{1}$ . This implies that $\mathds{k}_{{\mathrm{v}}}\in{\mathscr{S}}_{\text{{\scalebox{0.75}{$ \mathrm{DSRI} $}}}}$ . In [49], the amplitude modulated locally stationary (AMLS) kernels are introduced, which are generalized form of $\mathds{k}_{{\mathrm{v}}}$ . More precisely, let $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{st} $}}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ be a stationary positive-definite kernel, i.e., we have

[TABLE]

Subsequently, the AMLS kernel $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{AMLS} $}}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ is defined as

[TABLE]

Note that since $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{st} $}}}}$ is a stationary kernel, we know that $\sup_{t\in{\mathbb{T}}}\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{st} $}}}}(t,t)=\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{st} $}}}}(0,0)<\infty$ . Therefore, due to Theorem 2 and $\mathds{k}_{{\mathrm{b}}}\in{\mathscr{S}}_{\text{{\scalebox{0.75}{$ \mathrm{DSRI} $}}}}$ , we have $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{AMLS} $}}}}\in{\mathscr{S}}_{\text{{\scalebox{0.75}{$ \mathrm{DSRI} $}}}}$ . In addition to $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{AMLS} $}}}}$ , the simulation induced kernels are introduced in [49]. Similar to our previous discussion, one can show that under certain conditions, the simulation induced kernels are DSRI (see Appendix A.4 for more details).

We can show that the DSRI property is preserved under proper sampling (see Appendix A.5) and reparameterization of the arguments of the kernel (see Appendix A.6). Using Theorem 1 and Theorem 2, based on the discussion provided in this section, and following line of arguments similar to Appendices A.3, A.4, A.5, and A.6, one can show that a broad range of kernels are DSRI. The class of DSRI kernels is further studied in the next section.

5 DSRI Kernels: Stability and Integrability

To elaborate further on the structure of the class of DSRI kernels, we investigate their stability and integrability properties in this section. Since in the kernel-based system identification framework, the kernel attributes are inherited by the identified model, one may ask about the main feature of concern, which is the stability of the kernel. To address this question, we need to recall the notion of stable kernels [19].

Definition 4 ([19]).

The positive-definite kernel $\,\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ is said to be stable if, for any ${\mathrm{u}}=(u_{s})_{s\in{\mathbb{T}}}\in{\mathscr{L}}^{\infty}$ , one has

[TABLE]

The class of stable kernels is denoted by ${\mathscr{S}}_{\mathrm{s}}$ .

The following theorem demonstrates the relationship between the DSRI kernels and the stable kernels.

Theorem 3 ([20]).

Every DSRI kernel is stable.

We have already verified that ${\mathscr{S}}_{\text{{\scalebox{0.75}{$ \mathrm{DSRI} $}}}}\subseteq{\mathscr{S}}_{\mathrm{s}}$ . In addition to stable kernels, a well-known interesting category of kernels in the context of system identification are the integrable ones. In the following, we review their definition.

Definition 5 ([19]).

The positive-definite kernel $\,\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ is called integrable if we have

[TABLE]

The class of integrable kernels is denoted by ${\mathscr{S}}_{1}$ .

It is known that the set of integrable kernels is a subclass of stable kernels [19, 37], i.e., ${\mathscr{S}}_{1}\subseteq{\mathscr{S}}_{\mathrm{s}}$ . The following theorem further characterizes the class of DSRI kernels by elaborating their connection with the integrable kernels. This theorem is implicitly implied from the proof of Lemma 2 in [20].

Theorem 4 ([20]).

Every DSRI kernel is integrable.

In [39], it is verified that there exists a stable kernel $\mathds{k}:{\mathbb{Z}}_{+}\times{\mathbb{Z}}_{+}\to{\mathbb{R}}$ which is not integrable, i.e., $\sum_{s,t\in{\mathbb{Z}}_{+}}|\mathds{k}(s,t)|=\infty$ . The next theorem verifies a similar property for DSRI kernels.

Theorem 5.

There exists an integrable kernel which is not a DSRI kernel.

The following corollary is a direct result of Theorem 5 and the fact that any integrable kernel is stable [19].

Corollary 6.

There exists a stable kernel which is not a DSRI kernel.

In [37], other categories of positive-definite kernels are considered. The positive-definite kernel $\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ is said to be finite-trace if we have

[TABLE]

Similarly, it is called a squared integrable kernel if

[TABLE]

The class of finite-trace kernels and the class of squared integrable kernels are denoted by ${\mathscr{S}}_{\mathrm{ft}}$ and ${\mathscr{S}}_{2}$ , respectively [37]. Based on the above discussion and [37], we have

[TABLE]

where all of the inclusions are strict.

See Figure 1 for an illustration of the discussion presented in the current section and the previous section. One should compare this figure with Figure 1 in [37].

6 Operator Continuity and DSRI Kernels

In this section, we study certain topological features of the RKHSs equipped with DSRI kernels, namely the continuity of linear operators defined on them.

We recall that with respect to each positive-definite kernel, a Hilbert space is defined uniquely [1]. More precisely, based on the Moore-Aronszajn theorem, these Hilbert spaces are exactly the ones where the evaluation functionals are bounded [1, 5].

Theorem 7 ([5]).

*Given a positive-definite kernel $\,\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ , there exists a unique Hilbert space ${\mathscr{H}}_{\mathbbm{k}}\subseteq{\mathbb{R}}^{{\mathbb{T}}}$ with inner product ${\langle{\cdot,\cdot}\rangle}_{{\mathscr{H}}_{\mathbbm{k}}}$ , referred to as the RKHS with kernel $\mathds{k}$ , where for each $t\in{\mathbb{T}}$ , we have

i) $\mathds{k}_{t}\in{\mathscr{H}}_{\mathbbm{k}}$ , and

ii) $g_{t}={\langle{{\mathrm{g}},\mathds{k}_{t}}\rangle}_{{\mathscr{H}}_{\mathbbm{k}}}$ , for all ${\mathrm{g}}=(g_{s})_{s\in{\mathbb{T}}}\in{\mathscr{H}}_{\mathbbm{k}}$ .

The second feature is called the reproducing property.*

In the context of system identification, the RKHSs endowed with the stable kernels are of special interest due to their particular feature reviewed in the following theorem.

Theorem 8 ([19, 55, 56]).

Let $\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ be a positive-definite kernel. Then, ${\mathscr{H}}_{\mathbbm{k}}\subseteq{\mathscr{L}}^{1}$ if and only if $\mathds{k}$ is a stable kernel. In this case, ${\mathscr{H}}_{\mathbbm{k}}$ is called a stable RKHS.

Given a stable kernel $\mathds{k}$ , we know that ${\mathscr{H}}_{\mathbbm{k}}\subseteq{\mathscr{L}}^{1}$ . Accordingly, various objects introduced on ${\mathscr{L}}^{1}$ can be redefined by restricting them to ${\mathscr{H}}_{\mathbbm{k}}$ . Here, one may ask about the inherited properties followed by this restriction. The main feature of DSRI kernels is that the continuity of operators defined on ${\mathscr{L}}^{1}$ is inherited when they are restricted to the corresponding RKHS.

Theorem 9.

Let ${\mathscr{B}}$ be a Banach space equipped with norm $\|\cdot\|_{{\mathscr{B}}}$ and ${\mathrm{L}}:{\mathscr{L}}^{1}\to{\mathscr{B}}$ be a continuous operator. If $\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}$ is a DSRI kernel, then ${\mathscr{H}}_{\mathbbm{k}}$ is a linear subspace of ${\mathscr{L}}^{1}$ and ${\mathrm{L}}:{\mathscr{H}}_{\mathbbm{k}}\to{\mathscr{B}}$ is continuous. Moreover, we have

[TABLE]

Given a Banach space ${\mathscr{B}}$ with norm $\|\cdot\|_{{\mathscr{B}}}$ , we denote by ${\mathscr{L}}^{\infty}({\mathbb{T}};{\mathscr{B}})$ the space of ${\mathscr{B}}$ -valued Bochner measurable functions where the essential supremum of their norm in ${\mathscr{B}}$ is bounded, i.e., for any ${\mathrm{v}}=(v_{t})_{t\in{\mathbb{T}}}\in{\mathscr{L}}^{\infty}({\mathbb{T}};{\mathscr{B}})$ , we have $\operatornamewithlimits{ess\,sup}_{t\in{\mathbb{T}}}\|v_{t}\|_{{\mathscr{B}}}<\infty$ [57].

Theorem 10.

Let $\,{\mathrm{v}}$ be an arbitrary element in ${\mathscr{L}}^{\infty}({\mathbb{T}};{\mathscr{B}})$ and $\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}$ be a positive-definite kernel. Define an operator ${\mathrm{L}}:{\mathscr{H}}_{\mathbbm{k}}\to{\mathscr{B}}$ as follows

[TABLE]

for any ${\mathrm{g}}=(g_{t})_{t\in{\mathbb{T}}}\in{\mathscr{H}}_{\mathbbm{k}}$ . If $\mathds{k}$ is a DSRI kernel, then ${\mathrm{L}}$ is a continuous linear operator.

Theorem 9 and Theorem 10 allow one to transfer different existing results for BIBO stable impulse responses to RKHS ${\mathscr{H}}_{\mathbbm{k}}$ . The following corollaries are examples of this.

Corollary 11.

Let $\mathds{k}$ be a DSRI kernel, ${\mathrm{u}}\in{\mathscr{L}}^{\infty}$ be a bounded signal and $t\in{\mathbb{T}}_{\pm}$ . Define the convolution operator ${\mathrm{L}}^{\!{\mathrm{u}}}_{t}:{\mathscr{H}}_{\mathbbm{k}}\to{\mathbb{R}}$ as

[TABLE]

for any ${\mathrm{g}}=(g_{t})_{t\in{\mathbb{T}}}\in{\mathscr{H}}_{\mathbbm{k}}$ . Then, ${\mathrm{L}}^{\!{\mathrm{u}}}_{t}:{\mathscr{H}}_{\mathbbm{k}}\to{\mathbb{R}}$ is a continuous linear operator.

Let $\Omega_{{\mathbb{T}}}$ be defiend as $\Omega_{{\mathbb{T}}}:=[0,\pi]$ when ${\mathbb{T}}={\mathbb{Z}}_{+}$ , and $\Omega_{{\mathbb{T}}}:={\mathbb{R}}_{+}$ when ${\mathbb{T}}={\mathbb{R}}_{+}$ . With respect to each $\omega$ in $\Omega_{{\mathbb{T}}}$ , the operators ${\mathrm{F}}_{\omega}^{\text{\rm{(r)}}}:{\mathscr{H}}_{\mathbbm{k}}\to{\mathbb{R}}$ and ${\mathrm{F}}_{\omega}^{\text{\rm{(i)}}}:{\mathscr{H}}_{\mathbbm{k}}\to{\mathbb{R}}$ are defined respectively as

[TABLE]

and

[TABLE]

for any ${\mathrm{g}}=(g_{t})_{t\in{\mathbb{T}}}\in{\mathscr{H}}_{\mathbbm{k}}$ . Moreover, we define ${\mathcal{F}}_{\omega}:{\mathscr{H}}_{\mathbbm{k}}\to{\mathbb{C}}$ as ${\mathcal{F}}_{\omega}={\mathrm{F}}_{\omega}^{\text{\rm{(r)}}}+\mathrm{j}{\mathrm{F}}_{\omega}^{\text{\rm{(i)}}}$ , where $\mathrm{j}$ denotes imaginary unit. One can see that ${\mathrm{F}}_{\omega}^{\text{\rm{(i)}}}({\mathrm{g}})$ and ${\mathrm{F}}_{\omega}^{\text{\rm{(i)}}}({\mathrm{g}})$ respectively corresponds to the real and imaginary part of Fourier transform of impulse response ${\mathrm{g}}\in{\mathscr{H}}_{\mathbbm{k}}$ evaluated at frequency $\omega\in\Omega_{{\mathbb{T}}}$ , which is ${\mathcal{F}}_{\omega}({\mathrm{g}})$ . From Theorem 10, we have the following corollary for the introduced operators.

Corollary 12.

Let $\mathds{k}$ be a DSRI kernel. Then, ${\mathrm{F}}_{\omega}^{\text{\rm{(r)}}}$ , ${\mathrm{F}}_{\omega}^{\text{\rm{(i)}}}$ and ${\mathcal{F}}_{\omega}$ are continuous linear operators, for all $\omega\in\Omega_{{\mathbb{T}}}$ .

7 Stable Gaussian Processes

Let $(\Omega,{\mathscr{G}}_{\Omega},{\mathbb{P}})$ be a probability space, where $\Omega$ is the sample space, ${\mathscr{G}}_{\Omega}$ is the corresponding $\sigma$ -algebra, and ${\mathbb{P}}$ is the probability measure defined on ${\mathscr{G}}_{\Omega}$ . Given a measurable function ${\mathrm{m}}=(m_{t})_{t\in{\mathbb{T}}}$ and a positive-definite kernel $\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ , the stochastic process

[TABLE]

is called a Gaussian process (GP) with mean ${\mathrm{m}}$ and kernel $\mathds{k}$ [5], denoted by $\mathcal{G\!P}({\mathrm{m}},\mathds{k})$ , when, for any $n\in{\mathbb{N}}$ and any $t_{1},\ldots,t_{n}\in{\mathbb{T}}$ , the random vector $[g_{t_{1}},\ldots,g_{t_{n}}]^{{\scalebox{0.63}{$ \mathsf{T} $}}}\!$ has a Gaussian distribution as follows

[TABLE]

The following definition reviews the notion of an interesting class of Gaussian processes in the context of system identification [20].

Definition 6 ([20]).

The Gaussian process $\mathcal{G\!P}({\mathrm{m}},\mathds{k})$ is said to be stable in the BIBO sense if its realizations, also known as sample paths, are almost surely BIBO stable impulse responses, i.e., ${\mathbb{P}}[\|{\mathrm{g}}\|_{1}<\infty]=1$ .

The importance of stable GPs is according to their role in the Bayesian interpretation of kernel-based impulse response identification. Hence, one may ask about the necessary and sufficient conditions for the stability of the Gaussian process $\mathcal{G\!P}({\mathrm{m}},\mathds{k})$ 222This question has been raised during workshop “Bayesian and Kernel-Based Methods in Learning Dynamical Systems”, 21 ${}^{\text{\tiny{st}}}$ IFAC World Congress, Berlin, Germany, 2020.. Part of this question is addressed in [20], which is reviewed in the following lemma.

Lemma 13 ([20]).

Let $\mathds{k}$ be a positive-definite kernel and ${\mathrm{g}}\sim\mathcal{G\!P}(\mathbf{0},\mathds{k})$ , where $\mathbf{0}$ denotes the constant zero function. If kernel $\mathds{k}$ is DSRI, then we have ${\mathbb{P}}[\|{\mathrm{g}}\|_{1}<\infty]=1$ .

According to Lemma 13, the DSRI feature of $\mathds{k}$ is a sufficient condition for the almost sure BIBO stability of ${\mathrm{g}}$ when ${\mathrm{g}}\sim\mathcal{G\!P}(\mathbf{0},\mathds{k})$ . The following lemma concerns the other direction of Lemma 13. Before proceeding further, we need to present additional definitions. Let the function $\Phi:{\mathbb{R}}_{+}\to[0,1]$ be defined as

[TABLE]

for any $\delta\in{\mathbb{R}}_{+}$ . Note that $\Phi$ is closely related to the Gaussian error function, i.e., $\Phi(\delta)$ is the probability that the value of a standard Gaussian random variable is in the interval $[-\delta,\delta]$ , for any $\delta\in{\mathbb{R}}_{+}$ . Moreover, one can see that $\Phi$ is a strictly increasing bijective function, and therefore, it has a well-defined inverse $\Phi^{-1}:[0,1]\to{\mathbb{R}}_{+}$ , which is also a strictly increasing bijective map.

Lemma 14.

Let $\mathds{k}$ be a positive-definite kernel and ${\mathrm{g}}\sim\mathcal{G\!P}(\mathbf{0},\mathds{k})$ , where $\mathbf{0}$ is the constant zero function. If ${\mathbb{P}}[\|{\mathrm{g}}\|_{1}<\infty]>0$ , then $\mathds{k}$ is a DSRI kernel and we have ${\mathbb{P}}[\|{\mathrm{g}}\|_{1}<\infty]=1$ .

Following this, we have the main theorem of this section which is implied from Lemma 13 and Lemma 14.

Theorem 15.

Let ${\mathrm{m}}=(m_{t})_{t\in{\mathbb{T}}}$ be a stable impulse response and $\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ be a positive-definite kernel. Also, let $\mathcal{G\!P}({\mathrm{m}},\mathds{k})$ be the Gaussian process with mean impulse response ${\mathrm{m}}$ and kernel $\mathds{k}$ . Then, if $\,\mathds{k}$ is a DSRI kernel, we have ${\mathbb{P}}[\|{\mathrm{g}}\|_{1}<\infty]=1$ , and if $\,\mathds{k}$ is not a DSRI kernel, we have ${\mathbb{P}}[\|{\mathrm{g}}\|_{1}<\infty]=0$ .

The following corollary is a direct result of Theorem 15 and the definition of (BIBO) stability for the Gaussian processes.

Corollary 16.

Let the assumptions of Theorem 15 holds. Then, $\mathcal{G\!P}({\mathrm{m}},\mathds{k})$ is stable if and only if $\,\mathds{k}$ is a DSRI kernel.

The theorem and corollary presented here have an interesting interpretation. For ${\mathrm{g}}=(g_{t})_{t\in{\mathbb{T}}}\sim\mathcal{G\!P}({\mathrm{m}},\mathds{k})$ and $t\in{\mathbb{T}}$ , we know that $g_{t}$ is a random variable with Gaussian distribution ${\mathcal{N}}(m_{t},\mathds{k}(t,t))$ . Accordingly, with respect to each $\varepsilon\in(0,1)$ , we can characterize an $\varepsilon$ confidence interval based on the standard deviation of $g_{t}$ . More precisely, the $\varepsilon$ confidence interval for $g_{t}$ , denoted by $I_{t,\varepsilon}$ , is defined as

[TABLE]

where $\delta_{\varepsilon}$ is the positive real scalar specified as $\delta_{\varepsilon}=\Phi^{-1}(\varepsilon)$ . Furthermore, let impulse responses ${\mathrm{s}}_{\varepsilon}^{+}$ and ${\mathrm{s}}_{\varepsilon}^{-}$ be defined respectively as

[TABLE]

and

[TABLE]

We know that ${\mathrm{s}}_{\varepsilon}^{+}$ and ${\mathrm{s}}_{\varepsilon}^{-}$ corresponds respectively to the upper and lower bounds of the introduced point-wise $\varepsilon$ confidence intervals. Accordingly, we can define an $\varepsilon$ confidence region, denoted by ${\mathcal{R}}_{\varepsilon}$ , as the union of $\varepsilon$ confidence intervals $\{{\mathcal{I}}_{t,\varepsilon}\,|\,t\in{\mathbb{T}}\}$ , i.e., ${\mathcal{R}}_{\varepsilon}=\cup_{t\in{\mathbb{T}}}{\mathcal{I}}_{t,\varepsilon}$ . One can easily see that ${\mathcal{R}}_{\varepsilon}$ is the region between the impulse responses ${\mathrm{s}}_{\varepsilon}^{+}$ and ${\mathrm{s}}_{\varepsilon}^{-}$ (see Figure 2). Note that due to the definition of ${\mathcal{I}}_{t,\varepsilon}$ , we have ${\mathbb{P}}\big{[}g_{t}\in{\mathcal{I}}_{t,\varepsilon}\big{]}=\varepsilon$ , for any $t\in{\mathbb{T}}$ . However, one should note that this argument does not imply ${\mathbb{P}}\big{[}{\mathrm{g}}\in{\mathcal{R}}_{\varepsilon}\big{]}\geq\varepsilon$ . On the other hand, the theorem and corollary say that ${\mathrm{g}}$ is a stable impulse response with probability one, that is ${\mathbb{P}}[\|{\mathrm{g}}\|_{1}<\infty]=1$ , if and only if, the confidence bound impulse responses ${\mathrm{s}}_{\varepsilon}^{+}$ and ${\mathrm{s}}_{\varepsilon}^{-}$ are stable, or equivalently, the $\varepsilon$ confidence region ${\mathcal{R}}_{\varepsilon}$ has finite area. Moreover, if the area of ${\mathcal{R}}_{\varepsilon}$ is infinite, then ${\mathrm{g}}$ is an unstable impulse response with probability one, i.e., ${\mathbb{P}}[\|{\mathrm{g}}\|_{1}=\infty]=1$ . In Figure 2, we have shown $50$ sample paths of an example Gaussian process, the corresponding mean impulse response ${\mathrm{m}}$ , and the confidence bound impulse responses ${\mathrm{s}}_{\varepsilon}^{+}$ and ${\mathrm{s}}_{\varepsilon}^{-}$ , where $\varepsilon=0.95$ .

8 Conclusion

We have investigated the class of diagonally square root integrable kernels in this work. It is verified that the category of DSRI kernels includes well-known kernels used in system identification, such as diagonally/correlated, tuned/correlated, stable spline, amplitude-modulated locally stationary, and simulation-induced kernels. We have observed that the DSRI kernel category has a cone structure endowed with a partial order. Moreover, this kernel class is a subclass of stable kernels and integrable kernels. We have looked into certain fundamental topological properties of the RKHSs with DSRI kernels. More precisely, we have noticed that the continuity of linear operators defined on ${\mathscr{L}}^{1}$ is inherited when they are restricted to a RKHS equipped with a DSRI kernel. Furthermore, it has been verified that the realizations of a Gaussian process centered at a stable impulse response are almost surely stable if and only if the corresponding kernel admits the DSRI property.

Appendix A Appendix

A.1 Proof of Theorem 1

Part i) For the case of ${\mathbb{T}}={\mathbb{R}}_{+}$ , one can easily see that

[TABLE]

A similar argument holds when ${\mathbb{T}}={\mathbb{Z}}_{+}$ .

Part ii) For any $t_{1},\ldots,t_{m}\in{\mathbb{T}}$ and $a_{1},\ldots,a_{m}\in{\mathbb{R}}$ , we have

[TABLE]

which says that $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{R} $}}}n\text{{\scalebox{0.75}{$ \mathrm{E} $}}}}$ is a positive-definite kernel. For any $s,t\in{\mathbb{T}}$ , one can see that $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{R} $}}}n\text{{\scalebox{0.75}{$ \mathrm{E} $}}}}(s,t)\leq\alpha^{\frac{1}{2}(s+t)}\lambda$ , where $\alpha:=\max_{1\leq i\leq n}\alpha_{i}$ and $\lambda:=\sum_{i=1}^{n}\lambda_{i}$ . Therefore, we have

[TABLE]

which implies that $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{R} $}}}n\text{{\scalebox{0.75}{$ \mathrm{E} $}}}}$ is a DSRI kernel. ∎

A.2 Proof of Theorem 2

Part i) One can easily see that

[TABLE]

for any $t\in{\mathbb{T}}$ . Accordingly, the proof follows directly from the triangle inequality and Definition 2.

Part ii) For any $t\in{\mathbb{T}}$ , we have

[TABLE]

which implies the claim from the Definition 2. ∎

A.3 DSRI Property for High-order Stable Spline Kernels

Let $\beta$ be a positive real number and $(x)_{+}$ denote the non-negative part of $x$ , for any $x\in{\mathbb{R}}$ , that is $(x)_{+}:=\max\{x,0\}$ . With respect to each $n\in{\mathbb{Z}}_{+}$ , the $n^{\text{\tiny{th}}}$ -order stable spline kernel $\mathds{k}_{\SS{n}}:{\mathbb{R}}_{+}\times{\mathbb{R}}_{+}\to{\mathbb{R}}$ is defined as

[TABLE]

for any $s,t\in{\mathbb{R}}_{+}$ [19].

Theorem 17.

The $n^{\text{\tiny{th}}}$ -order stable spline kernel is DSRI.

Proof.

For each $t\in{\mathbb{R}}_{+}$ , one can easily see that

[TABLE]

Therefore, $\mathds{k}_{\SS{n}}$ is diagonally dominated by kernel $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{R} $}}}n\text{{\scalebox{0.75}{$ \mathrm{E} $}}}}(\cdot,\cdot\,;1,\mathrm{e}^{-(2n-1)\beta})$ . Thus, due to Theorem 1, $\mathds{k}_{\SS{n}}$ is a DSRI kernel. ∎

A.4 DSRI Property for Simulation-Induced Kernels

Given ${\mathrm{v}}=(v_{t})_{t\in{\mathbb{T}}}$ in ${\mathscr{L}}^{1}$ with non-negative values, a stable SISO system of order $n$ with realization $({\mathrm{A}},{\mathrm{b}},{\mathrm{c}},d)$ , and $n$ by $n$ positive-definite matrix ${\mathrm{Q}}$ , the simulation-induced kernel $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{SI} $}}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ is defined such that, for any $s,t\in{\mathbb{T}}$ , we have

[TABLE]

when ${\mathbb{T}}={\mathbb{Z}}_{+}$ , and

[TABLE]

when ${\mathbb{T}}={\mathbb{R}}_{+}$ [49].

Theorem 18.

Let assume that there exist $\gamma_{1},\gamma_{2}\in{\mathbb{R}}_{+}$ and $\alpha\in[0,1)$ such that, for any $t\in{\mathbb{T}}$ , we have $\|\Phi_{t}\|\leq\gamma_{1}\,\alpha^{t}$ and $v_{t}^{2}\leq\gamma_{2}\,\alpha^{t}$ , where $\Phi_{t}$ denotes matrix ${\mathrm{A}}^{t}$ , when ${\mathbb{T}}={\mathbb{Z}}_{+}$ , and matrix $\mathrm{e}^{{\mathrm{A}}t}$ , when ${\mathbb{T}}={\mathbb{R}}_{+}$ . Then, $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{SI} $}}}}$ is a DSRI kernel.

Proof.

For any $t\in{\mathbb{T}}$ , one can show that

[TABLE]

when ${\mathbb{T}}={\mathbb{Z}}_{+}$ , and

[TABLE]

when ${\mathbb{T}}={\mathbb{R}}_{+}$ , where $\gamma=\|{\mathrm{b}}\|^{2}\|{\mathrm{c}}\|^{2}\gamma_{1}^{2}\gamma_{2}$ . Define the kernel $\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ as

[TABLE]

for any $s,t\in{\mathbb{T}}$ , where $\lambda_{1}:=\gamma_{1}^{2}\|{\mathrm{Q}}\|\|{\mathrm{c}}\|^{2}$ , $\lambda_{2}:=\frac{\gamma}{\alpha(1-\alpha)}$ , when ${\mathbb{T}}={\mathbb{Z}}_{+}$ , and $\lambda_{2}:=-\frac{\gamma}{\ln(\alpha)}$ , when ${\mathbb{T}}={\mathbb{R}}_{+}$ . According to Theorem 1 and Theorem 2, we know that $\mathds{k}$ is a DSRI kernel. Moreover, due to (34) and (35), one can easily see that $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{SI} $}}}}$ is diagonally dominated by $\mathds{k}$ . Therefore, kernel $\mathds{k}_{\text{{\scalebox{0.75}{$ \mathrm{SI} $}}}}$ is DSRI. ∎

A.5 DSRI Property and Sampling

We say $\sigma:{\mathbb{Z}}_{+}\to{\mathbb{R}}_{+}$ is a proper sampling function if $\inf_{t\in{\mathbb{Z}}_{+}}\sigma(t+1)-\sigma(t)>0$ . The following theorem says that DSRI property is preserved under proper sampling.

Theorem 19.

Let $\mathds{k}:{\mathbb{R}}_{+}\times{\mathbb{R}}_{+}\to{\mathbb{R}}$ be a positive-definite kernel and $\mathds{k}_{\sigma}:{\mathbb{Z}}_{+}\times{\mathbb{Z}}_{+}\to{\mathbb{R}}$ be defined as

[TABLE]

Define function $d_{\mathds{k}}:{\mathbb{R}}_{+}\to{\mathbb{R}}$ as $d_{\mathds{k}}(t)=\mathds{k}(t,t)^{\frac{1}{2}}$ , for any $t$ . If $\mathds{k}$ is a DSRI kernel with non-increasing $d_{\mathds{k}}$ , then $\mathds{k}_{\sigma}$ is a DSRI kernel.

Proof.

The positive-definiteness of $\mathds{k}_{\sigma}$ is a direct result of the same property for $\mathds{k}$ . Let $\delta$ be defined as $\delta:=\inf_{t\in{\mathbb{Z}}_{+}}\sigma(t+1)-\sigma(t)>0$ . Since $\delta>0$ and $d_{\mathds{k}}$ is a non-increasing function, we have

[TABLE]

which implies that $\mathds{k}_{\sigma}$ is a DSRI kernel. ∎

A.6 DSRI Property for Reparameterized Kernels

Theorem 20.

Let $\mathds{k}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ be a DSRI kernel and $\rho:{\mathbb{T}}\to{\mathbb{T}}$ be a strictly increasing function, which is assumed to be differentiable with $\inf_{\tau\in{\mathbb{R}}_{+}}\frac{\mathrm{d}\rho(\tau)}{\mathrm{d}\tau}>0$ , when ${\mathbb{T}}={\mathbb{R}}_{+}$ . Define $\mathds{h}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}}$ as $\mathds{h}(s,t)=\mathds{k}(\rho(s),\rho(t))$ , for any $s,t\in{\mathbb{T}}$ . Then, $\mathds{h}$ is a DSRI kernel.

Proof.

The positive-definiteness of $\mathds{h}$ is directly concluded from the same property of $\mathds{k}$ . The properties of $\rho$ imply that it has a well-defined inverse function $\rho^{-1}:\rho({\mathbb{T}})\to{\mathbb{T}}$ , which is a strictly increasing map. Therefore, for any $s\in\rho({\mathbb{T}})$ , there exists a unique $t\in{\mathbb{T}}$ such that $s=\rho(t)$ . Accordingly, for the case of ${\mathbb{T}}={\mathbb{Z}}_{+}$ , we have

[TABLE]

which implies that $\mathds{h}$ is DSRI. Similarly, for the case of ${\mathbb{T}}={\mathbb{R}}_{+}$ , we have

[TABLE]

This concludes the proof. ∎

A.7 Proof of Theorem 5

Let ${\mathbb{T}}={\mathbb{Z}}_{+}$ and define a symmetric function $\mathds{k}:{\mathbb{Z}}_{+}\times{\mathbb{Z}}_{+}\to{\mathbb{R}}$ such that, for any $s,t\in{\mathbb{Z}}_{+}$ , we have

[TABLE]

For any $t_{1},\ldots,t_{n}\in{\mathbb{Z}}_{+}$ and any $a_{1},\ldots,a_{n}\in{\mathbb{R}}$ , one can see that

[TABLE]

where $\bar{t}=\max\{t_{1},\ldots,t_{n}\}$ and ${\mathcal{I}}_{t}=\big{\{}i\in\{1,\ldots,n\}\big{|}t_{i}=t\big{\}}$ , for $t=0,\ldots,\bar{t}$ . This implies that $\mathds{k}$ is a positive-definite kernel. Moreover, we have

[TABLE]

and

[TABLE]

Therefore, $\mathds{k}$ is an integrable positive-definite kernel which is not DSRI. Let ${\mathbb{T}}={\mathbb{R}}_{+}$ and function $f:{\mathbb{R}}_{+}\to{\mathbb{R}}_{+}$ be defined as

[TABLE]

for any $t\in{\mathbb{R}}_{+}$ . Note that $f$ is a continuous and positive function. Define $\mathds{h}:{\mathbb{Z}}_{+}\times{\mathbb{Z}}_{+}\to{\mathbb{R}}$ such that, for any $s,t\in{\mathbb{Z}}_{+}$ , we have $\mathds{h}(s,t)=\mathds{k}(\lfloor s\rfloor,\lfloor t\rfloor)g(s)g(t)$ , where $\mathds{k}$ is introduced in (39), and function $g:{\mathbb{R}}_{+}\to{\mathbb{R}}_{+}$ is defined as $g(s)=f(s-\lfloor s\rfloor)$ , for any $s\in{\mathbb{R}}_{+}$ . One can easily see that $\mathds{h}$ is continuous. Moreover, for any $t_{1},\ldots,t_{n}\in{\mathbb{Z}}_{+}$ and any $b_{1},\ldots,b_{n}\in{\mathbb{R}}$ , we have

[TABLE]

where $a_{i}$ is defined as $a_{i}:=b_{i}g(t_{i})$ , for $i=1,\ldots,n$ . Therefore, due to (40), we have $\sum_{i,j=1}^{n}b_{i}b_{j}\mathds{h}(t_{i},t_{j})\geq 0$ , which implies that $\mathds{h}$ is a positive-definite kernel. We know that

[TABLE]

which implies that $\mathds{h}$ is integrable. On the other hand, we have

[TABLE]

and thus, from definition of $\mathds{k}$ , it follows that

[TABLE]

Therefore, $\mathds{h}$ is not a DSRI kernel. ∎

A.8 Proof of Theorem 9

The first part of the theorem is due to Theorem 3. For the second part of the theorem, we only provide the proof for the case of ${\mathbb{T}}={\mathbb{R}}_{+}$ . The proof for ${\mathbb{T}}={\mathbb{Z}}_{+}$ is similar.

Let ${\mathrm{g}}=(g_{s})_{s\in{\mathbb{R}}_{+}}$ . Due to the reproducing property, we have $g_{s}={\langle{{\mathrm{g}},\mathds{k}_{s}}\rangle}_{{\mathscr{H}}_{\mathbbm{k}}}$ and $\|\mathds{k}_{s}\|_{{\mathscr{H}}_{\mathbbm{k}}}^{2}={\langle{\mathds{k}_{s},\mathds{k}_{s}}\rangle}_{{\mathscr{H}}_{\mathbbm{k}}}=\mathds{k}(s,s)$ , for any $s\in{\mathbb{R}}_{+}$ . Subsequently, from the Cauchy-Schwartz inequality, it follows that

[TABLE]

Accordingly, since $\|{\mathrm{g}}\|_{1}=\int_{{\mathbb{R}}_{+}}|g_{s}|\mathrm{d}s$ , we have

[TABLE]

On the other hand, from the definition of operator norm, it follows that

[TABLE]

Considering (47), we know that

[TABLE]

Therefore, due to (49) and the definition of operator norm, we have

[TABLE]

which implies (16) and concludes the proof. ∎

A.9 Proof of Theorem 10

By an abuse of notation, we define ${\mathrm{L}}:{\mathscr{L}}^{1}\to{\mathscr{B}}$ similarly to (17). According to [57, Theorem 8.2], $(g_{t}v_{t})_{t\in{\mathbb{T}}}$ is a Bochner integrable function, for any ${\mathrm{g}}=(g_{t})_{t\in{\mathbb{T}}}\in{\mathscr{L}}^{1}$ . This implies that ${\mathrm{L}}:{\mathscr{L}}^{1}\to{\mathscr{B}}$ is a well-defined linear operator. Furthermore, we have

[TABLE]

for any ${\mathrm{g}}=(g_{t})_{t\in{\mathbb{T}}}\in{\mathscr{L}}^{1}$ . Therefore, one can see

[TABLE]

i.e., ${\mathrm{L}}:{\mathscr{L}}^{1}\to{\mathscr{B}}$ is a continuous linear operator. Thus, the claim follows directly from Theorem 9. ∎

A.10 Proof of Lemma 14

We prove the lemma for the case of ${\mathbb{T}}={\mathbb{R}}_{+}$ . The proof for ${\mathbb{T}}={\mathbb{Z}}_{+}$ follows the same line of argument. Note that we have

[TABLE]

Accordingly, from the sub-additivity property of ${\mathbb{P}}$ , we know that

[TABLE]

Therefore, there exists $r\in{\mathbb{N}}$ such that, for event $A$ defined as $A:=\big{\{}\omega\in\Omega\,\big{|}\,\|{\mathrm{g}}(\omega)\|_{1}\leq r\big{\}}$ , we have $\gamma:={\mathbb{P}}[A]>0$ . Accordingly, due to the properties of indicator functions, the definition of $A$ , and the Tonelli’s Theorem [58], we can see that

[TABLE]

With respect to each $t\in{\mathbb{R}}_{+}$ , define event $B_{t}$ as

[TABLE]

where $\varepsilon$ is the positive real number characterized as $\varepsilon:=\Phi^{-1}(\frac{1}{2}\gamma)$ . For each $t\in{\mathbb{R}}_{+}$ , we have $A\supseteq A\cap B_{t}^{\mathrm{c}}$ . Therefore, from (52) and (53), it follows that

[TABLE]

Moreover, for each $t\in{\mathbb{R}}_{+}$ , we have $\mathbbm{1}_{A\cap B_{t}^{\mathrm{c}}}\geq\mathbbm{1}_{A}-\mathbbm{1}_{B_{t}}$ , which implies that ${\mathbb{E}}\big{[}\mathbbm{1}_{A\cap B_{t}^{\mathrm{c}}}\big{]}\geq{\mathbb{P}}[A]-{\mathbb{P}}[B_{t}]$ . Subsequently, from (52) and (54), we can see that

[TABLE]

We know that $g_{t}\sim{\mathcal{N}}(0,\mathds{k}(t,t)^{\frac{1}{2}})$ , $t\in{\mathbb{R}}_{+}$ . Accordingly, from the definition of sets $A$ and $B_{t}$ , we have

[TABLE]

Therefore, (55) implies that

[TABLE]

and subsequently, we have ${\mathscr{M}}(\mathds{k})<\infty$ , and $\mathds{k}$ is a DSRI kernel. Furthermore, from Lemma 13, it follows that ${\mathbb{P}}[\|{\mathrm{g}}\|_{1}<\infty]=1$ , which concludes the proof. ∎

A.11 Proof of Theorem 15

Note that ${\mathrm{g}}\sim\mathcal{G\!P}({\mathrm{m}},\mathds{k})$ if and only if ${\mathrm{g}}-{\mathrm{m}}\sim\mathcal{G\!P}(\mathbf{0},\mathds{k})$ . Since ${\mathrm{m}}$ is a stable impulse response, the stability of ${\mathrm{g}}$ is equivalent to the stability of ${\mathrm{g}}-{\mathrm{m}}$ . Accordingly, the claim follows from Lemma 13 and Lemma 14. ∎

Bibliography58

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society , vol. 68, no. 3, pp. 337–404, 1950.
2[2] E. Parzen, “Statistical inference on time series by Hilbert space methods, i,” Department of Statistics, Stanford University, Technical Report No. 23, Tech. Rep., 1959.
3[3] G. Wahba, Spline Models for Observational Data . SIAM, 1990.
4[4] F. Cucker and S. Smale, “Best choices for regularization parameters in learning theory: On the bias-variance problem,” Foundations of Computational Mathematics , vol. 2, no. 4, pp. 413–428, 2002.
5[5] A. Berlinet and C. Thomas-Agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics . Springer Science and Business Media, 2011.
6[6] M. Khosravi, “Representer theorem for learning Koopman operators,” IEEE Transactions on Automatic Control , 2023.
7[7] G. S. Kimeldorf and G. Wahba, “A correspondence between Bayesian estimation on stochastic processes and smoothing by splines,” The Annals of Mathematical Statistics , vol. 41, no. 2, pp. 495–502, 1970.
8[8] M. Lukić and J. Beder, “Stochastic processes with sample paths in reproducing kernel Hilbert spaces,” Transactions of the American Mathematical Society , vol. 353, no. 10, pp. 3945–3969, 2001.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Diagonally Square Root Integrable Kernels in System Identification

Abstract

keywords:

1 Introduction

2 Notation and Preliminaries

3 Diagonally Square Root Integrable Kernels

Definition 1** ([5]).**

Definition 2**.**

4 Well-known DSRI Kernels

Definition 3**.**

Theorem 1**.**

Theorem 2**.**

5 DSRI Kernels: Stability and Integrability

Definition 4** ([19]).**

Theorem 3** ([20]).**

Definition 5** ([19]).**

Theorem 4** ([20]).**

Theorem 5**.**

Corollary 6**.**

6 Operator Continuity and DSRI Kernels

Theorem 7** ([5]).**

Theorem 8** ([19, 55, 56]).**

Theorem 9**.**

Theorem 10**.**

Corollary 11**.**

Corollary 12**.**

7 Stable Gaussian Processes

Definition 6** ([20]).**

Lemma 13** ([20]).**

Lemma 14**.**

Theorem 15**.**

Corollary 16**.**

8 Conclusion

Appendix A Appendix

A.1 Proof of Theorem 1

A.2 Proof of Theorem 2

A.3 DSRI Property for High-order Stable Spline Kernels

Theorem 17**.**

Proof.

A.4 DSRI Property for Simulation-Induced Kernels

Theorem 18**.**

Proof.

A.5 DSRI Property and Sampling

Theorem 19**.**

Proof.

A.6 DSRI Property for Reparameterized Kernels

Theorem 20**.**

Proof.

A.7 Proof of Theorem 5

A.8 Proof of Theorem 9

A.9 Proof of Theorem 10

A.10 Proof of Lemma 14

A.11 Proof of Theorem 15

Definition 1 ([5]).

Definition 2.

Definition 3.

Theorem 1.

Theorem 2.

Definition 4 ([19]).

Theorem 3 ([20]).

Definition 5 ([19]).

Theorem 4 ([20]).

Theorem 5.

Corollary 6.

Theorem 7 ([5]).

Theorem 8 ([19, 55, 56]).

Theorem 9.

Theorem 10.

Corollary 11.

Corollary 12.

Definition 6 ([20]).

Lemma 13 ([20]).

Lemma 14.

Theorem 15.

Corollary 16.

Theorem 17.

Theorem 18.

Theorem 19.

Theorem 20.