Diagonally Square Root Integrable Kernels in System Identification
Mohammad Khosravi, Roy S. Smith

TL;DR
This paper investigates diagonally square root integrable kernels within RKHS theory, showing their stability, integrability, and topological properties, with implications for system identification and Gaussian process stability.
Contribution
It introduces and analyzes the class of DSRI kernels, demonstrating their stability, integrability, and relevance to Gaussian process stability in system identification.
Findings
Various well-known kernels are DSRI.
DSRI kernels are stable and integrable.
Stability of Gaussian processes is characterized by DSRI kernels.
Abstract
In recent years, the reproducing kernel Hilbert space (RKHS) theory has played a crucial role in linear system identification. The core of a RKHS is the associated kernel characterizing its properties. Accordingly, this work studies the class of diagonally square root integrable (DSRI) kernels. We demonstrate that various well-known stable kernels introduced in system identification belong to this category. Moreover, it is shown that any DSRI kernel is also stable and integrable. We look into certain topological features of the RKHSs associated with DSRI kernels, particularly the continuity of linear operators defined on the respective RKHSs. For the stability of a Gaussian process centered at a stable impulse response, we show that the necessary and sufficient condition is the diagonally square root integrability of the corresponding kernel. Furthermore, we elaborate on this result by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsControl Systems and Identification · Target Tracking and Data Fusion in Sensor Networks · Image and Signal Denoising Methods
MethodsGaussian Process
Diagonally Square Root Integrable Kernels in System Identification
Mohammad Khosravi [email protected]
Roy S. Smith [email protected] Delft Center for Systems and Control, Delft University of Technology
Automatic Control Laboratory, ETH Zürich
Abstract
In recent years, the reproducing kernel Hilbert space (RKHS) theory has played a crucial role in linear system identification. The core of a RKHS is the associated kernel characterizing its properties. Accordingly, this work studies the class of diagonally square root integrable (DSRI) kernels. We demonstrate that various well-known stable kernels introduced in system identification belong to this category. Moreover, it is shown that any DSRI kernel is also stable and integrable. We look into certain topological features of the RKHSs associated with DSRI kernels, particularly the continuity of linear operators defined on the respective RKHSs. For the stability of a Gaussian process centered at a stable impulse response, we show that the necessary and sufficient condition is the diagonally square root integrability of the corresponding kernel. Furthermore, we elaborate on this result by providing proper interpretations.
keywords:
system identification; kernel-based methods; diagonally square root integrable kernels; stable Gaussian processes
††thanks: This paper was not presented at any IFAC meeting. Corresponding author M. Khosravi.
,
1 Introduction
The theory of reproducing kernel Hilbert spaces (RKHSs) was introduced [1] midway through the twentieth century. The intrinsic properties of RKHSs, their one-to-one relationship with the positive definite kernels, and their fundamental ties to the Gaussian processes offer a strong foundation for addressing various estimation and interpolation problems [2, 3, 4, 5, 6]. Accordingly, they have become increasingly prevalent in statistics, signal processing, learning theory, and numerical analysis [7, 8, 9, 10]. On the other hand, system identification has emerged as the theory and techniques for estimating suitable mathematical representations of dynamical systems using measurement data [11], and remained an active field of research by developing numerous methodologies [12, 13, 14, 15, 16].
The RKHS theory is brought to the system identification area in [17] by developing kernel-based system identification methods. As a result, a paradigm shift occurred in the system identification theory [18] by addressing issues of bias-variance trade-off, robustness, and model order selection [19, 20, 21], unifying the identification of continuous-time systems and discrete-time systems [19], and allowing the inclusion of various side-information forms in the identification problem [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. Furthermore, due to the inherent connection between RKHSs and Gaussian processes [3], kernel-based methods offer a Bayesian interpretation of the system identification problem that allows quantifying the uncertainty and provides statistical guarantees [34]. Over the past decade, research on kernel-based system identification methods has received considerable attention and progressed significantly; nonetheless, it is still an ongoing field of research with various open problems and state-of-the-art results [35, 36, 37, 38, 39, 40].
The building block of each RKHS is the associated kernel function. As a result, various attributes of the RKHS elements are inherited from the corresponding kernel. Therefore, it is necessary to introduce kernels suitable for system identification [41]. The most prevalent kernels in the literature include diagonal/correlated, tuned/correlated, stable spline, and their extensions, which are proposed primarily for the sake of impulse response stability and smoothness [42, 43, 44]. For improving the identification performance of complex systems, various ideas on designing kernels by combining multiple kernels are proposed [45, 46, 47, 48]. Influenced by machine learning, harmonic analysis of stochastic processes, linear system theory, and filter design techniques, further categories of kernels are developed [49, 50, 51]. The significance of kernels led to the investigation of their more generic aspects, e.g., the relation between the absolute summability of kernels and their stability is clarified in [39]. Moreover, the link between various categories of kernels is studied in [37], where the mathematical foundations of stable kernels and their RKHSs are explored. Furthermore, in [20], it is shown that the realizations of a zero-mean Gaussian process are almost surely stable impulse responses if the corresponding kernel is diagonally square root integrable (DSRI).
In this work, we revisit the definition and notion of DSRI111Throughout this paper, DSRI stands for both of “diagonally square root integrable” and “diagonally square root integrability”. kernels, which was initially introduced in [20]. Following this, we investigate the class of DSRI kernels by describing its structure as a partially ordered cone. We show that this kernel category includes a broad range of well-known kernels commonly used in system identification, e.g., diagonally/correlated, stable spline, amplitude-modulated locally stationary, and simulation-induced kernels. The structure of DSRI kernel class is further elaborated by revisiting the fact that they are stable and integrable. This way, we obtain inner and outer approximations for the class of DSRI kernels. Subsequently, we investigate fundamental topological features of RKHSs with DSRI kernels. Namely, it is shown that for linear operators defined on , the space of stable impulse responses, the continuity property is inherited when the operator is restricted to a RKHS endowed with a DSRI kernel. For the stability of zero-mean Gaussian processes, we show that the sufficient condition introduced in [20] is also necessary. We further generalize this result and provide suitable interpretations. Due to the theoretical nature of the work and in an effort to further facilitate reading the manuscript, the burdensome technical arguments, such as proofs of theorems and lemmas, have been moved to the appendix. For the sake of completeness, the appendix provides all of the proofs, including the relatively simple ones.
2 Notation and Preliminaries
Throughout the paper, the set of natural numbers, the set of real numbers, the set of complex numbers, the set of non-negative integers, and the set of non-negative real numbers are denoted respectively by , , , , and . Moreover, denotes the time index set, which corresponds to either to or , and is defined as . The generic measure space in our discussion is , where and are respectively the -algebra of Borel subsets of and the Lebesgue measure, when , and, and are respectively the set of subsets of and the counting measure, when . Accordingly, we additionally consider the measure space , where and are respectively the product -algebra and product measure defined based on and . Furthermore, we assume is endowed with Borel -algebra and Lebesgue measure. Given a measurable space , the space of measurable functions is denoted by , and is shown entry-wise as , or . Given , the indicator function is defined as , if , and , otherwise. Depending on the context, denotes or . Similarly, refers to or . For , the norm in is denoted by . The norms defined on Banach spaces and are respectively denoted by and . The space of bounded linear operators from Banach space to Banach space is a Banach space, denoted by and endowed with norm [52].
3 Diagonally Square Root Integrable Kernels
In this section, the definition of diagonally square root integrable kernels is revisited. To this end, we need to recall the notion of Mercer kernels [5].
Definition 1** ([5]).**
The symmetric measurable function is said to be a positive-definite kernel, or simply, kernel, when, for any , , and , we have . For each , the function , defined as , is called the section of kernel at .
The following definition introduces our main object of interest in this paper.
Definition 2**.**
The positive-definite kernel is said to be diagonally square root integrable (DSRI) if , where is defined as
[TABLE]
The class of DSRI kernels is denoted by {\mathscr{S}}_{\text{{\scalebox{0.75}{\mathrm{DSRI}}}}}.
For any , one should note that , which is implied by positive-definiteness property given in Definition 1. Consequently, the right-hand sides in (1) are well-defined for any positive-definite kernel, with possible values in . According to Definition 2, kernel is DSRI when this value is finite, i.e., .
Given the definition of the DSRI kernels, it is natural to ask about the kernels satisfying this property and their particular features of interest. These questions will be addressed in the following sections.
4 Well-known DSRI Kernels
In this section, we study the class of DSRI kernels, {\mathscr{S}}_{\text{{\scalebox{0.75}{\mathrm{DSRI}}}}}, by showing that many well-known kernels in the system identification context belong to this category of kernels. To this end, we need the notion of (diagonal) dominancy, which introduces a partial order on the set of positive-definite kernels.
Definition 3**.**
Let be positive-definite kernels. We say dominates if there exists such that , for all . Similarly, it is said that diagonally dominates if the inequality holds when equals .
To elaborate on the importance of Definition 3 in describing {\mathscr{S}}_{\text{{\scalebox{0.75}{\mathrm{DSRI}}}}}, we need to introduce finite-rank exponential kernels. More precisely, given , \bm{\lambda}=[\lambda_{1},\ldots,\lambda_{n}]^{{\scalebox{0.63}{\mathsf{T} }}}\!\in{\mathbb{R}}_{+}^{n}, and \bm{\alpha}=[\alpha_{1},\ldots,\alpha_{n}]^{{\scalebox{0.63}{\mathsf{T} }}}\!\in[0,1)^{n}, the rank- exponential kernel \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{R}}}}n\text{{\scalebox{0.75}{\mathrm{E}}}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}} is defined as
[TABLE]
for any . We denote the kernel by \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{R}}}}n\text{{\scalebox{0.75}{\mathrm{E}}}}}(\cdot,\cdot\,;\bm{\lambda},\bm{\alpha}), and write \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{R}}}}n\text{{\scalebox{0.75}{\mathrm{E}}}}}(s,t\,;\bm{\lambda},\bm{\alpha}) on the left-hand side of (2), when we want to highlight the dependency on the hyperparameter vectors and .
Theorem 1**.**
i)* Let be positive-definite kernels where is DSRI. If (diagonally) dominates , then is DSRI.
ii) The rank- exponential kernel \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{R}}}}n\text{{\scalebox{0.75}{\mathrm{E}}}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}} defined in (2) is DSRI.*
Theorem 1 can be used to show that a variety of kernels belongs to {\mathscr{S}}_{\text{{\scalebox{0.75}{\mathrm{DSRI}}}}}. In the literature of system identification, various kernels are introduced [19, 53], e.g., diagonal, diagonally/correlated, tuned/correlated, and stable spline kernels, which are respectively denoted by \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{DI}}}}}, \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{DC}}}}}, \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{TC}}}}}, and , and defined as
[TABLE]
for any , where , , if , and, , if . Moreover, in [54], the first and second order integral stable spline kernels are defined as
[TABLE]
for any , where . We can directly calculate using (1), for the above-mentioned kernels, and show that these kernels belong to {\mathscr{S}}_{\text{{\scalebox{0.75}{\mathrm{DSRI}}}}}. On the other hand, we can easily see that kernels \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{DI}}}}}, \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{DC}}}}}, \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{TC}}}}}, and \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{iTC}}}}} are dominated by \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{R}}}}n\text{{\scalebox{0.75}{\mathrm{E}}}}}(\cdot,\cdot\,;1,\alpha). Similarly, we can show that the \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{R}}}}n\text{{\scalebox{0.75}{\mathrm{E}}}}}(\cdot,\cdot\,;1,\alpha^{3}) dominates and \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{iSS}}}}}. Thus, one can easily conclude from Theorem 1 that each of the above-mentioned kernels are DSRI. Based on the same line of argument, one can show the same result for the -order stable spline kernels [19] (see Appendix A.3 for more details).
Theorem 2**.**
*Let be positive-definite kernels, where is DSRI.
i) If is DSRI, then is a DSRI kernel, for any .
ii) If , then is a DSRI kernel.*
Theorem 1 and Theorem 2 characterize the structure of the class of DSRI kernels as a cone equipped with a partial order. Also, they can further be used to verify the DSRI property for other kernels. For example, consider kernel \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{iTS}}}}} introduced in [54] as the combination of \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{iTC}}}}} and \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{iSS}}}}}, i.e., we have \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{iTS}}}}}(s,t):=\mathds{k}_{\text{{\scalebox{0.75}{\mathrm{iTC}}}}}(s,t)+\mathds{k}_{\text{{\scalebox{0.75}{\mathrm{iSS}}}}}(s,t), for any . Based on the above discussion and Theorem 2, one can easily see that \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{iTS}}}}} is a DSRI kernel.
Let and be defined as , for any [49]. One can easily see that is a rank- positive-definite kernel with
[TABLE]
which says that . This implies that \mathds{k}_{{\mathrm{v}}}\in{\mathscr{S}}_{\text{{\scalebox{0.75}{\mathrm{DSRI}}}}}. In [49], the amplitude modulated locally stationary (AMLS) kernels are introduced, which are generalized form of . More precisely, let \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{st}}}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}} be a stationary positive-definite kernel, i.e., we have
[TABLE]
Subsequently, the AMLS kernel \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{AMLS}}}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}} is defined as
[TABLE]
Note that since \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{st}}}}} is a stationary kernel, we know that \sup_{t\in{\mathbb{T}}}\mathds{k}_{\text{{\scalebox{0.75}{\mathrm{st}}}}}(t,t)=\mathds{k}_{\text{{\scalebox{0.75}{\mathrm{st}}}}}(0,0)<\infty. Therefore, due to Theorem 2 and \mathds{k}_{{\mathrm{b}}}\in{\mathscr{S}}_{\text{{\scalebox{0.75}{\mathrm{DSRI}}}}}, we have \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{AMLS}}}}}\in{\mathscr{S}}_{\text{{\scalebox{0.75}{\mathrm{DSRI}}}}}. In addition to \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{AMLS}}}}}, the simulation induced kernels are introduced in [49]. Similar to our previous discussion, one can show that under certain conditions, the simulation induced kernels are DSRI (see Appendix A.4 for more details).
We can show that the DSRI property is preserved under proper sampling (see Appendix A.5) and reparameterization of the arguments of the kernel (see Appendix A.6). Using Theorem 1 and Theorem 2, based on the discussion provided in this section, and following line of arguments similar to Appendices A.3, A.4, A.5, and A.6, one can show that a broad range of kernels are DSRI. The class of DSRI kernels is further studied in the next section.
5 DSRI Kernels: Stability and Integrability
To elaborate further on the structure of the class of DSRI kernels, we investigate their stability and integrability properties in this section. Since in the kernel-based system identification framework, the kernel attributes are inherited by the identified model, one may ask about the main feature of concern, which is the stability of the kernel. To address this question, we need to recall the notion of stable kernels [19].
Definition 4** ([19]).**
The positive-definite kernel is said to be stable if, for any , one has
[TABLE]
The class of stable kernels is denoted by .
The following theorem demonstrates the relationship between the DSRI kernels and the stable kernels.
Theorem 3** ([20]).**
Every DSRI kernel is stable.
We have already verified that {\mathscr{S}}_{\text{{\scalebox{0.75}{\mathrm{DSRI}}}}}\subseteq{\mathscr{S}}_{\mathrm{s}}. In addition to stable kernels, a well-known interesting category of kernels in the context of system identification are the integrable ones. In the following, we review their definition.
Definition 5** ([19]).**
The positive-definite kernel is called integrable if we have
[TABLE]
The class of integrable kernels is denoted by .
It is known that the set of integrable kernels is a subclass of stable kernels [19, 37], i.e., . The following theorem further characterizes the class of DSRI kernels by elaborating their connection with the integrable kernels. This theorem is implicitly implied from the proof of Lemma 2 in [20].
Theorem 4** ([20]).**
Every DSRI kernel is integrable.
In [39], it is verified that there exists a stable kernel which is not integrable, i.e., . The next theorem verifies a similar property for DSRI kernels.
Theorem 5**.**
There exists an integrable kernel which is not a DSRI kernel.
The following corollary is a direct result of Theorem 5 and the fact that any integrable kernel is stable [19].
Corollary 6**.**
There exists a stable kernel which is not a DSRI kernel.
In [37], other categories of positive-definite kernels are considered. The positive-definite kernel is said to be finite-trace if we have
[TABLE]
Similarly, it is called a squared integrable kernel if
[TABLE]
The class of finite-trace kernels and the class of squared integrable kernels are denoted by and , respectively [37]. Based on the above discussion and [37], we have
[TABLE]
where all of the inclusions are strict.
See Figure 1 for an illustration of the discussion presented in the current section and the previous section. One should compare this figure with Figure 1 in [37].
6 Operator Continuity and DSRI Kernels
In this section, we study certain topological features of the RKHSs equipped with DSRI kernels, namely the continuity of linear operators defined on them.
We recall that with respect to each positive-definite kernel, a Hilbert space is defined uniquely [1]. More precisely, based on the Moore-Aronszajn theorem, these Hilbert spaces are exactly the ones where the evaluation functionals are bounded [1, 5].
Theorem 7** ([5]).**
*Given a positive-definite kernel , there exists a unique Hilbert space with inner product , referred to as the RKHS with kernel , where for each , we have
i) , and
ii) , for all .
The second feature is called the reproducing property.*
In the context of system identification, the RKHSs endowed with the stable kernels are of special interest due to their particular feature reviewed in the following theorem.
Theorem 8** ([19, 55, 56]).**
Let be a positive-definite kernel. Then, if and only if is a stable kernel. In this case, is called a stable RKHS.
Given a stable kernel , we know that . Accordingly, various objects introduced on can be redefined by restricting them to . Here, one may ask about the inherited properties followed by this restriction. The main feature of DSRI kernels is that the continuity of operators defined on is inherited when they are restricted to the corresponding RKHS.
Theorem 9**.**
Let be a Banach space equipped with norm and be a continuous operator. If is a DSRI kernel, then is a linear subspace of and is continuous. Moreover, we have
[TABLE]
Given a Banach space with norm , we denote by the space of -valued Bochner measurable functions where the essential supremum of their norm in is bounded, i.e., for any , we have [57].
Theorem 10**.**
Let be an arbitrary element in and be a positive-definite kernel. Define an operator as follows
[TABLE]
for any . If is a DSRI kernel, then is a continuous linear operator.
Theorem 9 and Theorem 10 allow one to transfer different existing results for BIBO stable impulse responses to RKHS . The following corollaries are examples of this.
Corollary 11**.**
Let be a DSRI kernel, be a bounded signal and . Define the convolution operator as
[TABLE]
for any . Then, is a continuous linear operator.
Let be defiend as when , and when . With respect to each in , the operators and are defined respectively as
[TABLE]
and
[TABLE]
for any . Moreover, we define as , where denotes imaginary unit. One can see that and respectively corresponds to the real and imaginary part of Fourier transform of impulse response evaluated at frequency , which is . From Theorem 10, we have the following corollary for the introduced operators.
Corollary 12**.**
Let be a DSRI kernel. Then, , and are continuous linear operators, for all .
7 Stable Gaussian Processes
Let be a probability space, where is the sample space, is the corresponding -algebra, and is the probability measure defined on . Given a measurable function and a positive-definite kernel , the stochastic process
[TABLE]
is called a Gaussian process (GP) with mean and kernel [5], denoted by , when, for any and any , the random vector [g_{t_{1}},\ldots,g_{t_{n}}]^{{\scalebox{0.63}{\mathsf{T} }}}\! has a Gaussian distribution as follows
[TABLE]
The following definition reviews the notion of an interesting class of Gaussian processes in the context of system identification [20].
Definition 6** ([20]).**
The Gaussian process is said to be stable in the BIBO sense if its realizations, also known as sample paths, are almost surely BIBO stable impulse responses, i.e., .
The importance of stable GPs is according to their role in the Bayesian interpretation of kernel-based impulse response identification. Hence, one may ask about the necessary and sufficient conditions for the stability of the Gaussian process 222This question has been raised during workshop “Bayesian and Kernel-Based Methods in Learning Dynamical Systems”, 21 IFAC World Congress, Berlin, Germany, 2020.. Part of this question is addressed in [20], which is reviewed in the following lemma.
Lemma 13** ([20]).**
Let be a positive-definite kernel and , where denotes the constant zero function. If kernel is DSRI, then we have .
According to Lemma 13, the DSRI feature of is a sufficient condition for the almost sure BIBO stability of when . The following lemma concerns the other direction of Lemma 13. Before proceeding further, we need to present additional definitions. Let the function be defined as
[TABLE]
for any . Note that is closely related to the Gaussian error function, i.e., is the probability that the value of a standard Gaussian random variable is in the interval , for any . Moreover, one can see that is a strictly increasing bijective function, and therefore, it has a well-defined inverse , which is also a strictly increasing bijective map.
Lemma 14**.**
Let be a positive-definite kernel and , where is the constant zero function. If , then is a DSRI kernel and we have .
Following this, we have the main theorem of this section which is implied from Lemma 13 and Lemma 14.
Theorem 15**.**
Let be a stable impulse response and be a positive-definite kernel. Also, let be the Gaussian process with mean impulse response and kernel . Then, if is a DSRI kernel, we have , and if is not a DSRI kernel, we have .
The following corollary is a direct result of Theorem 15 and the definition of (BIBO) stability for the Gaussian processes.
Corollary 16**.**
Let the assumptions of Theorem 15 holds. Then, is stable if and only if is a DSRI kernel.
The theorem and corollary presented here have an interesting interpretation. For and , we know that is a random variable with Gaussian distribution . Accordingly, with respect to each , we can characterize an confidence interval based on the standard deviation of . More precisely, the confidence interval for , denoted by , is defined as
[TABLE]
where is the positive real scalar specified as . Furthermore, let impulse responses and be defined respectively as
[TABLE]
and
[TABLE]
We know that and corresponds respectively to the upper and lower bounds of the introduced point-wise confidence intervals. Accordingly, we can define an confidence region, denoted by , as the union of confidence intervals , i.e., . One can easily see that is the region between the impulse responses and (see Figure 2). Note that due to the definition of , we have {\mathbb{P}}\big{[}g_{t}\in{\mathcal{I}}_{t,\varepsilon}\big{]}=\varepsilon, for any . However, one should note that this argument does not imply {\mathbb{P}}\big{[}{\mathrm{g}}\in{\mathcal{R}}_{\varepsilon}\big{]}\geq\varepsilon. On the other hand, the theorem and corollary say that is a stable impulse response with probability one, that is , if and only if, the confidence bound impulse responses and are stable, or equivalently, the confidence region has finite area. Moreover, if the area of is infinite, then is an unstable impulse response with probability one, i.e., . In Figure 2, we have shown sample paths of an example Gaussian process, the corresponding mean impulse response , and the confidence bound impulse responses and , where .
8 Conclusion
We have investigated the class of diagonally square root integrable kernels in this work. It is verified that the category of DSRI kernels includes well-known kernels used in system identification, such as diagonally/correlated, tuned/correlated, stable spline, amplitude-modulated locally stationary, and simulation-induced kernels. We have observed that the DSRI kernel category has a cone structure endowed with a partial order. Moreover, this kernel class is a subclass of stable kernels and integrable kernels. We have looked into certain fundamental topological properties of the RKHSs with DSRI kernels. More precisely, we have noticed that the continuity of linear operators defined on is inherited when they are restricted to a RKHS equipped with a DSRI kernel. Furthermore, it has been verified that the realizations of a Gaussian process centered at a stable impulse response are almost surely stable if and only if the corresponding kernel admits the DSRI property.
Appendix A Appendix
A.1 Proof of Theorem 1
Part i) For the case of , one can easily see that
[TABLE]
A similar argument holds when .
Part ii) For any and , we have
[TABLE]
which says that \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{R}}}}n\text{{\scalebox{0.75}{\mathrm{E}}}}} is a positive-definite kernel. For any , one can see that \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{R}}}}n\text{{\scalebox{0.75}{\mathrm{E}}}}}(s,t)\leq\alpha^{\frac{1}{2}(s+t)}\lambda, where and . Therefore, we have
[TABLE]
which implies that \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{R}}}}n\text{{\scalebox{0.75}{\mathrm{E}}}}} is a DSRI kernel. ∎
A.2 Proof of Theorem 2
Part i) One can easily see that
[TABLE]
for any . Accordingly, the proof follows directly from the triangle inequality and Definition 2.
Part ii) For any , we have
[TABLE]
which implies the claim from the Definition 2. ∎
A.3 DSRI Property for High-order Stable Spline Kernels
Let be a positive real number and denote the non-negative part of , for any , that is . With respect to each , the -order stable spline kernel is defined as
[TABLE]
for any [19].
Theorem 17**.**
The -order stable spline kernel is DSRI.
Proof.
For each , one can easily see that
[TABLE]
Therefore, is diagonally dominated by kernel \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{R}}}}n\text{{\scalebox{0.75}{\mathrm{E}}}}}(\cdot,\cdot\,;1,\mathrm{e}^{-(2n-1)\beta}). Thus, due to Theorem 1, is a DSRI kernel. ∎
A.4 DSRI Property for Simulation-Induced Kernels
Given in with non-negative values, a stable SISO system of order with realization , and by positive-definite matrix , the simulation-induced kernel \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{SI}}}}}:{\mathbb{T}}\times{\mathbb{T}}\to{\mathbb{R}} is defined such that, for any , we have
[TABLE]
when , and
[TABLE]
when [49].
Theorem 18**.**
Let assume that there exist and such that, for any , we have and , where denotes matrix , when , and matrix , when . Then, \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{SI}}}}} is a DSRI kernel.
Proof.
For any , one can show that
[TABLE]
when , and
[TABLE]
when , where . Define the kernel as
[TABLE]
for any , where , , when , and , when . According to Theorem 1 and Theorem 2, we know that is a DSRI kernel. Moreover, due to (34) and (35), one can easily see that \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{SI}}}}} is diagonally dominated by . Therefore, kernel \mathds{k}_{\text{{\scalebox{0.75}{\mathrm{SI}}}}} is DSRI. ∎
A.5 DSRI Property and Sampling
We say is a proper sampling function if . The following theorem says that DSRI property is preserved under proper sampling.
Theorem 19**.**
Let be a positive-definite kernel and be defined as
[TABLE]
Define function as , for any . If is a DSRI kernel with non-increasing , then is a DSRI kernel.
Proof.
The positive-definiteness of is a direct result of the same property for . Let be defined as . Since and is a non-increasing function, we have
[TABLE]
which implies that is a DSRI kernel. ∎
A.6 DSRI Property for Reparameterized Kernels
Theorem 20**.**
Let be a DSRI kernel and be a strictly increasing function, which is assumed to be differentiable with , when . Define as , for any . Then, is a DSRI kernel.
Proof.
The positive-definiteness of is directly concluded from the same property of . The properties of imply that it has a well-defined inverse function , which is a strictly increasing map. Therefore, for any , there exists a unique such that . Accordingly, for the case of , we have
[TABLE]
which implies that is DSRI. Similarly, for the case of , we have
[TABLE]
This concludes the proof. ∎
A.7 Proof of Theorem 5
Let and define a symmetric function such that, for any , we have
[TABLE]
For any and any , one can see that
[TABLE]
where and {\mathcal{I}}_{t}=\big{\{}i\in\{1,\ldots,n\}\big{|}t_{i}=t\big{\}}, for . This implies that is a positive-definite kernel. Moreover, we have
[TABLE]
and
[TABLE]
Therefore, is an integrable positive-definite kernel which is not DSRI. Let and function be defined as
[TABLE]
for any . Note that is a continuous and positive function. Define such that, for any , we have , where is introduced in (39), and function is defined as , for any . One can easily see that is continuous. Moreover, for any and any , we have
[TABLE]
where is defined as , for . Therefore, due to (40), we have , which implies that is a positive-definite kernel. We know that
[TABLE]
which implies that is integrable. On the other hand, we have
[TABLE]
and thus, from definition of , it follows that
[TABLE]
Therefore, is not a DSRI kernel. ∎
A.8 Proof of Theorem 9
The first part of the theorem is due to Theorem 3. For the second part of the theorem, we only provide the proof for the case of . The proof for is similar.
Let . Due to the reproducing property, we have and , for any . Subsequently, from the Cauchy-Schwartz inequality, it follows that
[TABLE]
Accordingly, since , we have
[TABLE]
On the other hand, from the definition of operator norm, it follows that
[TABLE]
Considering (47), we know that
[TABLE]
Therefore, due to (49) and the definition of operator norm, we have
[TABLE]
which implies (16) and concludes the proof. ∎
A.9 Proof of Theorem 10
By an abuse of notation, we define similarly to (17). According to [57, Theorem 8.2], is a Bochner integrable function, for any . This implies that is a well-defined linear operator. Furthermore, we have
[TABLE]
for any . Therefore, one can see
[TABLE]
i.e., is a continuous linear operator. Thus, the claim follows directly from Theorem 9. ∎
A.10 Proof of Lemma 14
We prove the lemma for the case of . The proof for follows the same line of argument. Note that we have
[TABLE]
Accordingly, from the sub-additivity property of , we know that
[TABLE]
Therefore, there exists such that, for event defined as A:=\big{\{}\omega\in\Omega\,\big{|}\,\|{\mathrm{g}}(\omega)\|_{1}\leq r\big{\}}, we have . Accordingly, due to the properties of indicator functions, the definition of , and the Tonelli’s Theorem [58], we can see that
[TABLE]
With respect to each , define event as
[TABLE]
where is the positive real number characterized as . For each , we have . Therefore, from (52) and (53), it follows that
[TABLE]
Moreover, for each , we have , which implies that {\mathbb{E}}\big{[}\mathbbm{1}_{A\cap B_{t}^{\mathrm{c}}}\big{]}\geq{\mathbb{P}}[A]-{\mathbb{P}}[B_{t}]. Subsequently, from (52) and (54), we can see that
[TABLE]
We know that , . Accordingly, from the definition of sets and , we have
[TABLE]
Therefore, (55) implies that
[TABLE]
and subsequently, we have , and is a DSRI kernel. Furthermore, from Lemma 13, it follows that , which concludes the proof. ∎
A.11 Proof of Theorem 15
Note that if and only if . Since is a stable impulse response, the stability of is equivalent to the stability of . Accordingly, the claim follows from Lemma 13 and Lemma 14. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society , vol. 68, no. 3, pp. 337–404, 1950.
- 2[2] E. Parzen, “Statistical inference on time series by Hilbert space methods, i,” Department of Statistics, Stanford University, Technical Report No. 23, Tech. Rep., 1959.
- 3[3] G. Wahba, Spline Models for Observational Data . SIAM, 1990.
- 4[4] F. Cucker and S. Smale, “Best choices for regularization parameters in learning theory: On the bias-variance problem,” Foundations of Computational Mathematics , vol. 2, no. 4, pp. 413–428, 2002.
- 5[5] A. Berlinet and C. Thomas-Agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics . Springer Science and Business Media, 2011.
- 6[6] M. Khosravi, “Representer theorem for learning Koopman operators,” IEEE Transactions on Automatic Control , 2023.
- 7[7] G. S. Kimeldorf and G. Wahba, “A correspondence between Bayesian estimation on stochastic processes and smoothing by splines,” The Annals of Mathematical Statistics , vol. 41, no. 2, pp. 495–502, 1970.
- 8[8] M. Lukić and J. Beder, “Stochastic processes with sample paths in reproducing kernel Hilbert spaces,” Transactions of the American Mathematical Society , vol. 353, no. 10, pp. 3945–3969, 2001.
