Regularized divergences between covariance operators and Gaussian measures on Hilbert spaces
Minh Ha Quang

TL;DR
This paper extends divergences between Gaussian measures from finite-dimensional spaces to infinite-dimensional Hilbert spaces, providing explicit formulas and showing convergence of regularized divergences to true divergences.
Contribution
It introduces regularized Kullback-Leibler and Rényi divergences for Gaussian measures on Hilbert spaces using infinite-dimensional Alpha Log-Determinant divergences, with convergence results.
Findings
Explicit formulas for divergences in the Gaussian Hilbert space setting
Regularized divergences converge to true divergences as regularization vanishes
General Gaussian setting covered
Abstract
This work presents an infinite-dimensional generalization of the correspondence between the Kullback-Leibler and R\'enyi divergences between Gaussian measures on Euclidean space and the Alpha Log-Determinant divergences between symmetric, positive definite matrices. Specifically, we present the regularized Kullback-Leibler and R\'enyi divergences between covariance operators and Gaussian measures on an infinite-dimensional Hilbert space, which are defined using the infinite-dimensional Alpha Log-Determinant divergences between positive definite trace class operators. We show that, as the regularization parameter approaches zero, the regularized Kullback-Leibler and R\'enyi divergences between two equivalent Gaussian measures on a Hilbert space converge to the corresponding true divergences. The explicit formulas for the divergences involved are presented in the most general Gaussian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Inequalities and Applications · Statistical Mechanics and Entropy · Point processes and geometric inequalities
∎
11institutetext: 22institutetext: RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, JAPAN
22email: [email protected]
Regularized divergences between covariance operators and Gaussian measures on Hilbert spaces
Hà Quang Minh
Abstract
This work presents an infinite-dimensional generalization of the correspondence between the Kullback-Leibler and Rényi divergences between Gaussian measures on Euclidean space and the Alpha Log-Determinant divergences between symmetric, positive definite matrices. Specifically, we present the regularized Kullback-Leibler and Rényi divergences between covariance operators and Gaussian measures on an infinite-dimensional Hilbert space, which are defined using the infinite-dimensional Alpha Log-Determinant divergences between positive definite trace class operators. We show that, as the regularization parameter approaches zero, the regularized Kullback-Leibler and Rényi divergences between two equivalent Gaussian measures on a Hilbert space converge to the corresponding true divergences. The explicit formulas for the divergences involved are presented in the most general Gaussian setting.
Keywords:
Gaussian measures Hilbert space covariance operators Kullback-Leibler divergence Rényi divergence regularized divergences
MSC:
28C20 60G15 47B65 15A15
1 Introduction
This work is concerned with the correspondence between divergences between covariance operators and the corresponding Gaussian measures on an infinite-dimensional Hilbert space. Specifically, we study the correspondence between the infinite-dimensional Alpha Log-Determinant (Log-Det) divergences between covariance operators on a Hilbert space and the Kullback-Leibler and Rényi divergences, together with related quantities, between Gaussian measures on .
In the finite-dimensional setting, let denote the set of symmetric, positive definite (SPD) matrices. Then a divergence on correspond to a divergence on the set of zero-mean Gaussian measures on with strictly positive covariance matrices. In particular, the Alpha Log-Det divergences Chebbi:2012Means on correspond to the Kullback-Leibler and Rényi divergences between zero-mean Gaussian measures on .
The infinite-dimensional generalization of the finite-dimensional setting requires substantially more mathematical machinery. It is not straightforward, for instance, to define Log-Determinant divergences between covariance operators on an infinite-dimensional Hilbert space , which are trace class operators, thus have vanishing eigenvalues and therefore unbounded inverses and principal logarithms. In Minh:LogDet2016 , the author generalized the Alpha Log-Det divergences on to the set of positive definite trace class operators on of the form , where is trace class, , and is the identity operator. This was subsequently generalized to the infinite-dimensional Alpha-Beta Log-Det divergences between positive definite trace class operators Minh:LogDet2016-AB and on the more general set of positive definite Hilbert-Schmidt operators Minh:GSI2017 . Other distance functions on the set of positive definite Hilbert-Schmidt operators include the affine-invariant Riemannian distance Larotonda:2007 Minh:GSI2015 and the Log-Hilbert-Schmidt distance MinhSB:NIPS2014 .
For a fixed , each of the above divergence/distance functions automatically becomes a divergence/distance function between covariance operators on . In particular, for covariance operators on reproducing kernel Hilbert spaces (RKHS), they all admit closed form expressions that can readily be employed in practical applications, see e.g. MinhSB:NIPS2014 ; Minh:CVPR2016 ; Minh:Covariance2017 . In computer vision and pattern recognition, other papers employing this approach include in ProbDistance:PAMI2006 and Covariance:CVPR2014 , in which Bregman divergences between RKHS covariance operators are applied to problems in object recognition and texture classification, among others.
It is not clear, however, how all of the above functions relate to the divergence/distance functions between Gaussian measures on the Hilbert space , such as the Kullback-Leibler or Rényi divergences, as is the case in the finite-dimensional setting. The aim of this work is to establish these correspondences in the case of the infinite-dimensional Alpha Log-Det divergences.
Contributions. The following are the main contributions of the current work.
We study regularized versions of the Kullback-Leibler and Rényi divergences between covariance operators and Gaussian measures on Hilbert spaces, using the infinite-dimensional Alpha Log-Det divergences. We show that for two equivalent Gaussian measures on , the regularized Kullback-Leibler and Rényi divergences converge to the corresponding true Kullback-Leibler and Rényi divergences, respectively, as the regularization parameter . 2. 2.
As part of the proof, we derive the explicit formulas for the Radon-Nikodym derivative and the true Kullback-Leibler and Rényi divergences between two equivalent Gaussian measures , on , under the most general setting. These formulas generalize those available in the current literature, which assume either or . We illustrate this with the computation of the Kullback-Leibler divergence between the posterior and prior probability measures, under the Gaussian setting, in a Bayesian inverse problem on Hilbert spaces.
Organization. The paper is structured as follows. In Section 2, we present the definitions of the regularized divergences between covariance operators and Gaussian measures on , using the Alpha Log-Det divergences. Section 3 summarizes the main results on the convergence of the regularized divergences to the true divergences. The proofs for the convergence are given in Sections 4 and 5. In Section 6, we present the explicit formulas for the Radon-Nikodym derivative and the true Kullback-Leibler and Rényi divergences between two equivalent Gaussian measures on .
Notation. Throughout the paper, we assume that is a real separable Hilbert space, with , unless explicitly stated otherwise. Let be the Banach space of bounded linear operators on , with operator norm . Let denote the subspace of bounded, self-adjoint operators on . Let denote the set of self-adjoint, positive operators on , that is . Let denote the set of self-adjoint, strictly positive operators on , that is , or equivalently, .
2 Main definitions
We first present the definitions of the key concepts involved in the paper, namely the infinite-dimensional Alpha Log-Determinant divergences and the corresponding regularized divergences between Gaussian measures on Hilbert spaces. Many of these concepts were first introduced in Minh:LogDet2016 .
2.1 Infinite-dimensional Alpha Log-Det divergences between positive definite trace-class operators
In Minh:LogDet2016 , we introduced the following infinite-dimensional divergences between positive definite trace class operators on a Hilbert space , which generalize the Alpha Log-Determinant divergences between SPD matrices Chebbi:2012Means .
Definition 1 (Alpha Log-Determinant divergences between positive definite trace class operators)
Assume that . For , the Log-Det -divergence between , , , is defined to be
[TABLE]
where . The limiting cases are defined by
[TABLE]
In Definition 1, denotes the extended Fredholm determinant defined via , for , with being the Fredholm determinant. Likewise, denotes the extended trace, defined by (see Minh:LogDet2016 for the motivations leading to these concepts).
In the case , assumes a much simpler form, which directly generalizes the finite-dimensional formulas in Chebbi:2012Means , as follows.
[TABLE]
The finite-dimensional formulas are obtained by letting and .
From the above formulation, the following result is immediate.
Theorem 2.1 (Regularized divergences between covariance operators and zero-mean Gaussian measures on Hilbert spaces)
Let be fixed. For each fixed , , the following is a divergence on the set of self-adjoint, positive trace class operators on
[TABLE]
Consequently, the following is a divergence on the set of Gaussian measures on with mean zero and covariance operators
[TABLE]
2.2 Regularized divergences between general Gaussian measures on Hilbert spaces
We next consider divergences between Gaussian measures on Hilbert spaces without the zero-mean condition. Motivated by the explicit formulas for the divergences between Gaussian densities in , in Minh:LogDet2016 we introduced the following regularized divergences between Gaussian measures on Hilbert spaces, using the infinite-dimensional Log-Det divergences above.
Definition 2 (Regularized Kullback-Leibler divergences between Gaussian measures on Hilbert spaces)
Let and be two Gaussian measures on , with corresponding mean vectors and covariance operators . For any fixed , , the regularized Kullback-Leibler divergence, denoted by , is defined to be
[TABLE]
Definition 3 (Regularized Rényi divergences between Gaussian measures on Hilbert spaces)
For two Gaussian measures and on , the regularized Rényi divergence of order , , for a fixed , , denoted by , is defined to be
[TABLE]
Remark. Our definition of the regularized Rényi divergence differs from that in Minh:LogDet2016 by a factor of . It is motivated from the finite-dimensional definition , see e.g. Pardo:2005 , of the Rényi divergence between two probability densities on . This differs from the original definition by Rényi Renyi:1961 , namely by the factor . The advantage of the current formulation is that one can see immediately that
[TABLE]
Definition 4 (Regularized Bhattacharyya and Hellinger distances between Gaussian measures on Hilbert spaces)
For two Gaussian measures and on , the regularized Bhattacharyya distance , for a fixed , is defined to be
[TABLE]
The regularized Hellinger distance is defined via the regularized Bhattacharyya distance by
[TABLE]
Properties of the regularized divergences.
The regularized divergences between any pair of covariance operators, not necessarily strictly positive, are always well-defined and finite for any . Likewise, the regularized divergences between the corresponding Gaussian measures, not necessarily non-degenerate or equivalent (see below), are always well-defined and finite for any . 2. 2.
The regularized divergences between Gaussian measures are defined explicitly in terms of their mean vectors and covariance operators, not via the evaluation of the Radon-Nikodym derivatives and the corresponding integrals. 3. 3.
In the RKHS setting, when the mean vectors and covariance operators are RKHS vectors and covariance operators, respectively, all of these divergences admit closed form formulas that can be efficiently computed Minh:LogDet2016 .
3 Main theorems
The regularized divergences stated above are well-defined for any pairs of Gaussian measures on a Hilbert space . It is not clear from the definition, however, whether they possess a probabilistic interpretation. We now show that they are, in fact, closely related to the corresponding true divergences when the Gaussian measures under consideration are equivalent. Specifically, the following results state that, as , the regularized Kullback-Leibler and regularized Rényi divergences between two equivalent, non-degenerate Gaussian measures and converge to the true Kullback-Leibler and Rényi divergences, respectively, between and .
Theorem 3.1 (Limiting behavior of the regularized Kullback-Leibler divergence)
*Let and be two non-degenerate, equivalent Gaussian measures on , that is with . Assume that and are equivalent, that is and there exists such that . Then *
[TABLE]
where denotes the Kullback-Leibler divergence between and .
In Theorem 3.1, denotes the Hilbert-Carleman determinant (see e.g. Simon:1977 ). For a Hilbert-Schmidt operator , the Hilbert-Carleman determinant of is defined by . In particular, for , we have , and . The function is continuous in the Hilbert-Schmidt norm, so that .
Theorem 3.1 can also be equivalently stated as
[TABLE]
where is the norm corresponding to the inner product
[TABLE]
of the Cameron-Martin space associated with .
Theorem 3.2 (Limiting behavior of the regularized Rényi divergences)
Assume the hypothesis of Theorem 3.1. Let denote the Rényi divergence of order between and , . Then
[TABLE]
Corollary 1 (Limiting behavior of the regularized Bhattacharyya and Hellinger distances)
Assume the hypothesis of Theorem 3.1. Let denote the true Bhattacharyya distance between and . Then
[TABLE]
Similarly, let denote the true Hellinger distance between and . Then
[TABLE]
Computational consequences. The focus of the current work is on the statistical interpretation of the infinite-dimensional Alpha Log-Det divergences and the corresponding regularized divergences between Gaussian measures on Hilbert spaces. The results just stated also suggest numerical algorithms for approximating the Kullback-Leibler and Rényi divergences between probability measures on infinite-dimensional Hilbert spaces. This is an important topic, see e.g. Pinski:2015KL ,Pinski:2015KLalgorithms , which will be explored in a companion future work.
3.1 Example: KL divergences in Bayesian inverse problems on Hilbert spaces
In this section, we apply the concept of regularized KL divergences above to the setting of linear Bayesian inverse problems. As a specific example, consider the following setting from Stuart:Inverse2010 (Theorem 6.20 and Example 6.23). Let be a Gaussian random variable on the Hilbert space , distributed according to the Gaussian measure , with . Let be a bounded linear operator. Assume that the following random variable is Gaussian
[TABLE]
where is independent of . Then the random variable is Gaussian, with density propositional to . The Gaussian measure corresponding to is , where and are given by, respectively (Stuart:Inverse2010 ),
[TABLE]
In the Bayesian setting, is the prior probability measure on and is the posterior probability measure of given the data . In Alexanderian:2016 , the authors computed the KL-divergence directly for . We now present the general formula for , which is a straightforward consequence of the general expression for the KL-divergence given in Theorem 3.1.
Theorem 3.3
Assume that and are given by Eqs. (26) and (27), respectively. Then the KL divergence between the posterior measure and the prior measure is given by
[TABLE]
Special case. For , we obtain
[TABLE]
This is precisely Eq.(19) in Proposition 3 in Alexanderian:2016 .
Remark. As noted in Alexanderian:2016 , the last term in Eq.(3.1) is precisely . As we can see from Theorem 3.1, this term is part of the general formula for KL divergences and is not a specific feature of the Bayesian inverse problem.
4 Limiting behavior of the regularized Kullback-Leibler divergences
In this section, we prove Equation (15) in Theorem 3.1, which we restate below.
Theorem 4.1
Assume the hypothesis of Theorem 3.1. Then
[TABLE]
The first term on the right hand side of (30) follows from the following result.
Proposition 1
Assume that . Then
[TABLE]
We first prove the following more general technical result.
Lemma 1
Let be a self-adjoint, positive, compact operator on . Then
[TABLE]
Assume further that , then for any ,
[TABLE]
Proof
Let be the eigenvalues of , with corresponding orthonormal eigenvectors , then we have the spectral decomposition . For each , write , where . Then . By Lebesgue’s Monotone Convergence Theorem, we then have This proves the first identity. If , then we have and
[TABLE]
Thus for any , we have
[TABLE]
∎
Proof (**
of Proposition 1**)
This follows from Lemma 1 by letting and . ∎
The second term on the right hand side of (30) follows from the following result.
Assumption 1
Let be self-adjoint, positive. Assume that there exists such that is strictly positive and that
[TABLE]
Theorem 4.2
Let be three bounded linear operators on satisfying the hypothesis of Assumption 1. Then
[TABLE]
The right hand side is nonnegative, with zero equality if and only if , that is if and only if . If, in addition, is assumed to be trace class, then
[TABLE]
The limit in Theorem 4.2 follows from the continuity of the Hilbert-Carleman determinant in the Hilbert-Schmidt norm . Its proof consists of two steps, which constitute the following two results.
Proposition 2
Let be two self-adjoint, positive, trace class operators. Assume that there exists a self-adjoint, Hilbert-Schmidt operator such that . Then for any , ,
[TABLE]
Proposition 3
Let be a compact, self-adjoint, positive operator on . Let . Then
[TABLE]
Lemma 2
Let such that is strictly positive. Then
[TABLE]
with equality if and only if .
Proof
Consider the function for . We have , with for and for . Thus has a unique global maximum . Hence , with equality if and only if .
Let denote the eigenvalues of , then since is strictly positive, we have . Then , with equality if and only if , that is if and only if . ∎
Proof
(of Theorem 4.2) By Proposition 2, we have for any ,
[TABLE]
By Proposition 3, we have
[TABLE]
By Theorem 6.5 in Simon:1977 , which states the continuity of the Hilbert-Carleman determinant in the Hilbert-Schmidt norm topology, we then obtain
[TABLE]
It then follows that
[TABLE]
By Lemma 2, the right hand side is always nonnegative, with zero equality if and only if . From the expression , this happens if and only if . If is trace class, then and we have
[TABLE]
∎
Proof
(of Proposition 2) By the product property of the extended Fredholm determinant (Proposition 4 in Minh:LogDet2016 ) and the commutativity of the extended trace operation (Lemma 4 in Minh:LogDet2016 ), we have
[TABLE]
For , we have for any , . Thus it follows that
[TABLE]
By definition of , we have
[TABLE]
∎
Proof of Proposition 3. We recall that a Banach space is said to have the Radon-Riesz Property if and weakly imply that for all and in . In particular, a Hilbert space possesses the Radon-Riesz Property. We now utilize this property for the Hilbert space , under the Hilbert-Schmidt inner product. We first prove the following.
Lemma 3
Let be a self-adjoint, positive, compact operator on . Then
[TABLE]
that is converges to in the weak operator topology as .
Proof
Let be the eigenvalues of , with corresponding orthonormal eigenvectors . For any , write , , where , . Then . For each , . Furthermore,
[TABLE]
Thus by Lebesgue’s Dominated Convergence Theorem, ∎
Remark 1
Lemma 3 states that converges weakly to the identity operator as . When , this convergence does not hold in the operator norm topology. For any , the operator has eigenvalues , with . However, if . In fact, we have
[TABLE]
for any , since . Thus .
Lemma 4
Let be a compact, self-adjoint, positive operator on . Let . Then for any operator ,
[TABLE]
i.e. converges weakly to in as .
Proof
Let be the eigenvalues of , with corresponding orthonormal eigenvectors . For any operator , we have
[TABLE]
By Lemma 3, we have for each fixed ,
[TABLE]
Furthermore,
[TABLE]
Thus by Lebesgue’s Dominated Convergence Theorem, we then have
[TABLE]
∎
Lemma 5
Let be a compact, self-adjoint, positive operator on . Let . Then
[TABLE]
Proof
Let be the eigenvalues of , with corresponding orthonormal eigenvectors . We have for any ,
[TABLE]
It follows that
[TABLE]
By Lemma 1, we have
[TABLE]
Furthermore,
[TABLE]
Thus by Lebesgue’s Dominated Convergence Theorem, we have
[TABLE]
∎
Lemma 6
Let be self-adjoint, positive. Let . Then
[TABLE]
If , then
[TABLE]
Proof
Since and commute, we have
[TABLE]
Since is trace class, self-adjoint, positive, , so that for , , . We then have
[TABLE]
If , then we have and
[TABLE]
∎
Proof
(of Proposition 3) By Lemma 4, we have for any ,
[TABLE]
that is the operator converges weakly to on as . By Lemma 5,
[TABLE]
Thus Radon-Riesz Property can be invoked to give
[TABLE]
∎
Proof (**
of Theorem 4.1**)
This follows from Proposition 1 and Theorem 4.2. ∎
5 Limiting behavior of the
regularized Rényi divergence
In this section, we prove Equation(19) in Theorem 3.2, which we restate below.
Theorem 5.1
Assume the hypothesis of Theorem 3.2. Then
[TABLE]
We need the following technical results.
Lemma 7 (Minh:LogDet2016-AB )
Let be fixed. Let , be such that , . Assume that . Then
[TABLE]
Proposition 4
Let be fixed. For and two self-adjoint, compact, positive operators on ,
[TABLE]
In particular, for .
Proof
By Lemma 1,
[TABLE]
By Theorem 2.2 in Fillmore:1971Operator , for any two bounded operators on ,
[TABLE]
In particular, for any two self-adjoint, positive bounded operators on ,
[TABLE]
Since , , this implies that , , and we have
[TABLE]
Thus if , then for and
[TABLE]
∎
Proof (**
of Theorem 5.1**)
By definition of the regularized Renyi divergence, Eq.(3),
[TABLE]
For the first term, we have
[TABLE]
Thus by Proposition 4, we have for ,
[TABLE]
Let be the eigenvalues of , with corresponding orthonormal eigenvectors . Since , we have . Then are the orthonormal eigenvectors of , with the same eigenvalues. Thus
[TABLE]
For the second term, by Definition 1,
[TABLE]
For , we have
[TABLE]
Thus the extended Fredholm determinant of is the Fredholm determinant of and consequently
[TABLE]
where .
By Proposition 3, we have . By Lemma 7,
[TABLE]
We then exploit the property that for any two Hilbert-Schmidt operators (see e.g. ReedSimon:Functional ). This gives us
[TABLE]
By the continuity of the Fredholm determinant with respect to the trace norm (see e.g. Theorem 3.5 in Simon:1977 ), we then obtain
[TABLE]
∎
6 The Radon-Nikodym derivatives and divergences between Gaussian measures on Hilbert spaces
For completeness, we now derive the explicit formulas for the exact Kullback-Leibler and Rényi divergences between two equivalent Gaussian measures, that is Eq. (16) in Theorem 3.1 and Eq. (20) in Theorem 3.2.
Throughout the following, we utilize the white noise mapping, see e.g. DaPrato:2006 ; DaPrato:PDEHilbert . Let and be a self-adjoint, positive trace class operator on . Assume that , then the Gaussian measure is said to be non-degenerate. Let be the eigenvalues of , with corresponding orthonormal eigenvectors , then , with . The inverse operator is unbounded, since with as . For , define the following subspace
[TABLE]
For , the space is called the Cameron-Martin space associated with the Gaussian measure . It is a Hilbert space with inner product
[TABLE]
In the following, for , we define
[TABLE]
White noise mapping. Consider the following mapping
[TABLE]
For any pair , we have by definition of the covariance operator
[TABLE]
Thus the map is an isometry, that is
[TABLE]
Since , the subspace is dense in and the map can be uniquely extended to all of , as follows. For any , let be a sequence in with . Then is a Cauchy sequence in , so that by isometry, is also a Cauchy sequence in , thus converging to a unique element in . Thus for any , we can define the map
[TABLE]
by the following unique limit in
[TABLE]
The map is called the white noise mapping associated with the measure . One sees immediately that maps any orthonormal sequence in to an orthonormal sequence in , since
[TABLE]
Furthermore, the random variables are independent (DaPrato:2006 , Proposition 1.28).
White noise mapping via finite-rank orthogonal projections. can be expressed explicitly in terms of the finite-rank orthogonal projections onto the -dimensional subspaces of spanned by , , where are the orthonormal eigenvectors of . For any , we have
[TABLE]
Thus is always well-defined . Furthermore, for all ,
[TABLE]
In other words, the operator is bounded and self-adjoint . Since the sequence converges to in , we have, in the sense,
[TABLE]
The Radon-Nikodym derivatives between Gaussian measures. Given their importance, these objects have been studied extensively, e.g. Capon:Radon1964 ; Shepp:1966Radon ; Henrich:Gaussian1972 ; DaPrato:2006 ; DaPrato:PDEHilbert ; Bogachev:Gaussian . However, the explicit formulas available in the literature generally consider two separate cases, namely two Gaussian measures both with mean zero or with the same covariance operator. We now present an explicit formula for the general case.
In the following, let be two self-adjoint, positive trace class operators on such that . Let . A fundamental result in the theory of Gaussian measures is the Feldman-Hajek Theorem Feldman:Gaussian1958 , Hajek:Gaussian1958 , which states that two Gaussian measures and are either mutually singular or mutually equivalent. The necessary and sufficient conditions for the equivalence of the two Gaussian measures and are given by the following.
Theorem 6.1 (Bogachev:Gaussian , Corollary 6.4.11, DaPrato:PDEHilbert , Theorems 1.3.9 and 1.3.10)
*Let be a separable Hilbert space. Consider two Gaussian measures and on . Then and are equivalent if and only if the following hold *
. 2. 2.
There exists , without the eigenvalue , such that
[TABLE]
For any , we have Fillmore:1971Operator , thus Eq.(66) implies
[TABLE]
We assume from now on that and are equivalent. In Corollary 6.4.11 in Bogachev:Gaussian , an explicit formula for the Radon-Nikodym derivative is given when . In Proposition 1.3.11 in DaPrato:PDEHilbert , an explicit formula is given when and is trace class. In the following, we present an explicit formula for the general case.
Let be the eigenvalues of , with corresponding orthonormal eigenvectors , which form an orthonormal basis in . The following result expresses the Radon-Nikodym derivative in terms of the ’s and ’s.
Theorem 6.2
Let , , with , . The Radon-Nikodym derivative is given by
[TABLE]
where for each
[TABLE]
The series converges in and and the function .
Special case. For , Theorem 6.2 gives
[TABLE]
This is essentially Eq. (6.4.13) in Corollary 6.4.11 in Bogachev:Gaussian .
Corollary 2
Assume the hypothesis of Theorem 6.2. Assume further that is trace class. The Radon-Nikodym derivative of with respect to is given by
[TABLE]
In the above expression,
[TABLE]
with the limits being in the and sense, respectively.
Special case. For and trace class, Corollary 2 gives
[TABLE]
This is precisely Proposition 1.3.11 in DaPrato:PDEHilbert .
Special case. If , then obviously and Corollary 2 gives
[TABLE]
This is precisely Theorem 6.14 in Stuart:Inverse2010 .
Special case: Radon-Nikodym derivative between Gaussian densities on . Let , , with , . Let be such that , then one can verify directly that
[TABLE]
To prove Theorem 6.2, we first prove the following.
Proposition 5
Assume that and that , where . Then the operator is necessarily strictly positive, that is .
Proof
For any , we have
[TABLE]
with equality if and only if , since . Thus we have
[TABLE]
with equality if and only if . Since , is dense in and , a sequence in such that . One has
[TABLE]
It follows that . Hence the operator is self-adjoint, positive on .
Let us show that is strictly positive. Assume that such that , then and there exists a sequence in such that and . Equivalently, there exists a sequence in such that and
[TABLE]
This implies that for any , we have
[TABLE]
Since , is dense in and thus . Thus the sequence converges weakly to zero in . Then for any ,
[TABLE]
Thus the sequence also converges weakly to zero in . Since we already assume that converges strongly, and hence weakly, to , by the uniqueness of the weak limit, we must have , contradicting our prior assumption that . ∎
In the following, we make use of the Vitali Convergence Theorem (see e.g. Folland:Real ; Rudin:RealComplex ). Let be a positive measurable space. A sequence of functions is said to be uniformly integrable if such that
[TABLE]
Theorem 6.3 (Vitali Convergence Theorem)
Assume that is a positive measurable space with . Let be a sequence of functions that are uniformly integrable on , with a.e. and a.e.. Then and .
Proposition 6
Let . Let , be such that . Then
[TABLE]
Special case. For , Proposition 6 gives
[TABLE]
With , the above formula gives Proposition 1.2.7 in DaPrato:PDEHilbert .
The proof of Proposition 6 requires the following results. The first one, Lemma 8, can be directly verified.
Lemma 8
Let and be such that . Then the operator is invertible and
[TABLE]
In particular, .
The second is the following result from DaPrato:PDEHilbert .
Theorem 6.4 (DaPrato:PDEHilbert , Proposition 1.2.8)
Assume that is a self-adjoint operator on such that . Let . Then
[TABLE]
Proof
(of Proposition 6) It suffices to prove for . We apply Theorem 6.4 as follows. Let , be the sequence of orthogonal projections in corresponding to the eigenvectors of . Consider the limit
[TABLE]
Let be fixed. We have
[TABLE]
Let , . Then for any ,
[TABLE]
which implies that , which is a rank-one operator with eigenvalue . If , then obviously . If , then . Also, . By Lemma 8, the operator is invertible, with
[TABLE]
It follows that
[TABLE]
By the assumption that , there exists such that , so that . Hence by Theorem 6.4, we have
[TABLE]
Taking limit as gives
[TABLE]
Hence it follows, by applying from Hölder’s Inequality, that the sequence of functions is uniformly integrable. Thus we can apply Vitali’s Convergence Theorem to obtain
[TABLE]
∎
Proposition 7
Assume the hypothesis of Theorem 6.2. There exists such that . Define , where is defined by Eq. (69) in Theorem 6.2. Then for all satisfying , with
[TABLE]
In particular, for ,
[TABLE]
Furthermore, for , the sequence is uniformly integrable on for .
Proof
For each fixed , we recall that the function is given by
[TABLE]
We first claim that there exists such that . Since , there exists such that . Let be such that , so that . Then
[TABLE]
Similarly, we have for all satisfying . Recall that since , we have . For satisfying , we have when and for . It follows that for all satisfying . Hence
[TABLE]
For each , by Proposition 6, with ,
[TABLE]
For each , consider the nonnegative function . By the independence of the functions , we have
[TABLE]
Since , by Lemma 22 we have
[TABLE]
Since , such that . Then by Lemma 22,
[TABLE]
Thus it follows that
[TABLE]
It follows that the sequence is increasing towards the limit . Hence the sequence is increasing towards the limit
[TABLE]
By Hölder’s Inequality, for any , for any set , we have
[TABLE]
Combining with the limit for , this shows that the sequence is uniformly integrable on . By Vitali’s Convergence Theorem,
[TABLE]
Thus it follows that . In particular, for ,
[TABLE]
∎
Lemma 9
For any , we have . For any ,
[TABLE]
In particular, for , . For any two ,
[TABLE]
In particular, an orthonormal sequence in gives rise to an orthonormal sequence in (see also DaPrato:PDEHilbert , Proposition 1.2.6).
Proof
For , by Lemma 19, we have
[TABLE]
Let . Since is dense in , let be a Cauchy sequence in with and . Then in . The previous identity gives
[TABLE]
The hypothesis and the above identity show that . Thus is a Cauchy sequence in and hence converges to a unique element in , which must be . Thus .
Let with the corresponding Cauchy sequence , . Then
[TABLE]
This give us the first and second identities. The third identity follows from the first by invoking the isometry . ∎
Lemma 10
Consider the functions
[TABLE]
Then .
Proof
By Lemma 9, the functions are orthonormal in . We rewrite as
[TABLE]
Consider the functions
[TABLE]
Since , there exists such that . By Lemma 9, we have for all ,
[TABLE]
Consider next the series
[TABLE]
By Lemma 21, we have , since ,
[TABLE]
It thus follows that for al ,
[TABLE]
as . Thus the series converges to a finite positive value. Together with , this implies that . Since is a probability measure, by Hölder’s Inequality, we have as . ∎
Lemma 11
Consider the functions
[TABLE]
Then , , and
[TABLE]
Proof
Since the functions are orthonormal in , we have
[TABLE]
Thus and
[TABLE]
Since is a probability measure, by Hölder’s Inequality, we have as . ∎
The following is a direct generalization of Claim 1 in Proposition 1.2.8 in DaPrato:PDEHilbert .
Lemma 12
Let be any orthonormal basis in . For any ,
[TABLE]
where the series converges in .
Proof (**
of Theorem 6.2**)
By Lemmas 10 and 11, the series converges in and . By Proposition 7, , with . Define
[TABLE]
Then is nonnegative and satisfies , with , i.e. is a probability measure on . To show that the two measures and coincide, we show that the corresponding characteristic functions are identical, that is
[TABLE]
For the measure , the characteristic function is given by
[TABLE]
To compute the characteristic function for , we first note that by Lemma 12,
[TABLE]
Let . The characteristic function for is given by
[TABLE]
For each , we have by Proposition 6, using the fact that ,
[TABLE]
For each , for the function , we have by the independence of the ’s that
[TABLE]
By Proposition 7, there exists is such that . Then for , the sequence is uniformly integrable on for all . Thus the sequence is also uniformly integrable for . For , Vitali’s Convergence Theorem gives
[TABLE]
For the first exponent, we have for any ,
[TABLE]
For the second exponent, since is an orthonormal basis for , we have
[TABLE]
For the third exponent,
[TABLE]
Thus, taking the limit as , we obtain
[TABLE]
Combining this with Eq. (94), we obtain the desired equality, namely
[TABLE]
∎
Lemma 13
Assume that is trace class. Then and the following limit holds in the sense,
[TABLE]
Proof
We first note that, since is trace class, is also trace class and
[TABLE]
as . Furthermore,
[TABLE]
showing that . By Hölder’s Inequality, we have
[TABLE]
since . It follows that
[TABLE]
as . ∎
Lemma 14
Let be arbitrary. Then and the following limit holds in the sense
[TABLE]
Proof
Since the sequence is orthonormal in , we have
[TABLE]
Thus . Furthermore,
[TABLE]
This gives the desired convergence. ∎
Proof
(of Corollary 2) When is trace class, the Fredholm determinant is well-defined and for strictly positive, we have
[TABLE]
From the spectral decomposition , we have ,
[TABLE]
By Lemma 13, taking limit as gives, where the limit is in ,
[TABLE]
Similarly,
[TABLE]
By Lemma 14, taking limit as , we have
[TABLE]
Combining these, we obtain
[TABLE]
∎
6.1 Exact Kullback-Leibler divergences
We now derive the explicit expression for the exact Kullback-Leibler divergence between two equivalent Gaussian measures on . In the following, let and be the white noise mapping induced by . Let , with and for some . Let be the eigenvalues of with corresponding orthonormal eigenvectors .
Theorem 6.5
*Let and , with and , where . Then *
[TABLE]
If, furthermore, is trace class, then
[TABLE]
For , we obtain the Kullback-Leibler divergence given in Michalek:1999 , which also derived the Rényi divergences between two zero-mean Gaussian measures with different covariance operators.
Lemma 15
For any ,
[TABLE]
In particular, for the orthonormal eigenvectors of ,
[TABLE]
Proof
For , which is dense in , we have
[TABLE]
By a limiting argument, we then have .
For any pair , we have
[TABLE]
[TABLE]
Since is dense in , by a limiting argument, we have ,
[TABLE]
For the orthonormal basis , we have , so that
[TABLE]
∎
Proposition 8
*Consider the functions *
[TABLE]
Then ,, and
[TABLE]
Proof
Using the expression for from Lemma 15, we have
[TABLE]
Furthermore, the expression for shows that
[TABLE]
By the Hölder Inequality, we obtain and . ∎
Lemma 16
For any pair ,
[TABLE]
In particular, for ,
[TABLE]
Proof
We have, by symmetry, for any , . Also, by Lemma 20, for any ,
[TABLE]
Thus for any pair , by Lemma 19,
[TABLE]
This completes the proof. ∎
Lemma 17
For any pair ,
[TABLE]
Proof
By assumption, there exist such that , . Thus
[TABLE]
∎
Lemma 18
For any ,
[TABLE]
In particular, for ,
[TABLE]
For two orthonormal eigenvectors of ,
[TABLE]
Proof
For , by Lemmas 16 and 17, we have
[TABLE]
The general case then follows by a limiting argument. ∎
Proposition 9
The following functions are orthonormal in
[TABLE]
Proof
We have by Lemma 15 that
[TABLE]
Thus the constant function is orthogonal to . By Lemma 18, for ,
[TABLE]
thus the sequence is orthogonal. By Lemma 18,
[TABLE]
This gives the normalization constant for each term in the sequence. ∎
Proposition 10
Consider the functions
[TABLE]
Then , , and
[TABLE]
Proof
Let , , then
[TABLE]
Consider the series of constants
[TABLE]
Consider the functions
[TABLE]
By Proposition 9 and the definition of above, we have
[TABLE]
Thus . Furthermore,
[TABLE]
as . Thus it follows that and . Since is a probability measure on , it also follows that and that . This completes the proof. ∎
Proof
(of Theorem 6.5) By Theorem 6.2,
[TABLE]
For each , by Lemma 15, we obtain
[TABLE]
For each , consider the function , , where
[TABLE]
By Propositions 10 and 8, we have , , and
[TABLE]
It follows that and that . Therefore
[TABLE]
Combining the last expression with the expression for , we obtain
[TABLE]
∎
Proof (**
of Theorem 3.3**)
Consider the formula
[TABLE]
where is given by . By Theorem 6.5,
[TABLE]
For the first term, since ,
[TABLE]
For the second and third terms,
[TABLE]
From the expression , we obtain
[TABLE]
Thus we have
[TABLE]
For the term , we have
[TABLE]
from which it follows that . Combining and with the first term gives the desired result. ∎
6.2 Exact Rényi divergences
In this section, we derive the exact formula for the Rényi divergences between two equivalent Gaussian measures and on . We recall that the Rényi divergence between and is defined by
[TABLE]
Theorem 6.6
Let , , with and , . The Rényi divergence of order , , between and is given by
[TABLE]
Furthermore,
[TABLE]
Proof (**
of Theorem 6.6**)
By Proposition 7, there exists such that . Proposition 7 then implies that for all satisfying . By definition of the Rényi divergence, we then have for ,
[TABLE]
By Proposition 7, we have for ,
[TABLE]
Thus it follows that
[TABLE]
Let , then we have
[TABLE]
Combining this with the previous expression, we obtain
[TABLE]
This completes the proof of the first part of the theorem.
We now compute . Let be the eigenvalues of , then
[TABLE]
By Lemma 23, we have . Thus by Lebesgue’s Monotone Convergence Theorem,
[TABLE]
Combining this limit with the expression for above, we obtain
[TABLE]
Also by Lemma 23 and Lebesgue’s Monotone Convergence Theorem,
[TABLE]
From the proof of Theorem 5.1, we have for any ,
[TABLE]
Combining the previous two limits with the expression for above, we obtain
[TABLE]
∎
6.3 Bhattacharyya and Hellinger distances
We now derive the explicit formulas for the Bhattacharyya and Hellinger distances between two equivalent Gaussian measures and on . Recall that the Bhattacharyya distance is defined by
[TABLE]
The Hellinger distance between and is defined by
[TABLE]
Corollary 3
Let and and be such that and . The Bhattacharyya distance between and is then given by
[TABLE]
The Hellinger distance between and is given by
[TABLE]
Proof (**
of Corollary 3**)
For the Bhattacharyya distance, we use the fact that and Theorem 6.6 to obtain
[TABLE]
The expression for then follows from . ∎
Proof (**
of Theorems 3.1 and 3.2 and Corollary 1**)
Theorems 3.1 follows from Theorem 4.1 and Theorem 6.5. Theorem 3.2 follows from Theorem 5.1 and Theorem 6.6. Corollary 1 follows from Theorem 5.1 and Corollary 3. ∎
7 Miscellaneous technical results
Let denote a Gaussian measure on with mean and covariance operator . Let denote the set of eigenvalues of , with corresponding orthonormal eigenvectors .
Lemma 19
For any pair ,
[TABLE]
In particular, for , .
Proof
It suffices to prove for . We apply the following (Handbook:1972 , Formula 7.4.4)
[TABLE]
Thus for any ,
[TABLE]
Write , . By symmetry, we have
[TABLE]
∎
Lemma 20
For any pair ,
[TABLE]
In particular, for , .
Proof
It suffices to prove for . Write , , then
[TABLE]
by symmetry, since each term in the integral contains either or . ∎
Lemma 21
In all inequalities below, equality happens if and only if .
[TABLE]
Lemma 22
Let be fixed. Then
[TABLE]
Lemma 23
Let be fixed. Then
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Abramowitz and I.A. Stegun. Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables . Applied Mathematics Series 55. National Bureau of Standards, 1972.
- 2[2] A. Alexanderian, P.J. Gloor, and O. Ghattas. On Bayesian A-and D-optimal experimental designs in infinite dimensions. Bayesian Analysis , 11(3):671–695, 2016.
- 3[3] V. Bogachev. Gaussian Measures . American Mathematical Society, 1998.
- 4[4] J. Capon. Randon-Nikodym derivatives of stationary Gaussian measures. The Annals of Mathematical Statistics , 35(2):517–531, 06 1964.
- 5[5] Z. Chebbi and M. Moakher. Means of Hermitian positive-definite matrices based on the log-determinant α 𝛼 \alpha -divergence function. Linear Algebra and its Applications , 436(7):1872–1889, 2012.
- 6[6] G. Da Prato. An introduction to infinite-dimensional analysis . Springer Science & Business Media, 2006.
- 7[7] G. Da Prato and J. Zabczyk. Second order partial differential equations in Hilbert spaces , volume 293. Cambridge University Press, 2002.
- 8[8] J. Feldman. Equivalence and perpendicularity of Gaussian processes. Pacific Journal of Mathematics , 8(4):699–708, 1958.
