Divergence functions in Information Geometry
Domenico Felice, Nihat Ay

TL;DR
This paper explores the properties of a canonical divergence function within Information Geometry, comparing it to other divergences and discussing open problems related to its symmetry features.
Contribution
It introduces and analyzes a canonical divergence in Information Geometry, highlighting its relation to existing divergences and outlining open problems about its symmetry.
Findings
Connections between the canonical divergence and other divergence functions
Discussion of symmetry properties and open problems
Insights into the structure of dual connections in Information Geometry
Abstract
A recently introduced canonical divergence for a dual structure is discussed in connection to other divergence functions. Finally, open problems concerning symmetry properties are outlined.
| Statistical manifold | Condition on | ||
|---|---|---|---|
| Self-dual | |||
| Dually flat | |||
| Symmetric |
| Statistical manifold | Relation of and | ||
|---|---|---|---|
| Dually flat | |||
| Symmetric | |||
| General |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Divergence functions in Information Geometry
Domenico Felice
Max Planck Institute for Mathematics in the Sciences
Inselstrasse 22–04103 Leipzig, Germany
Nihat Ay
Max Planck Institute for Mathematics in the Sciences
Inselstrasse 22–04103 Leipzig, Germany
Santa Fe Institute, Santa Fe, NM 87501, USA
Faculty of Mathematics and Computer Science, University of Leipzig, PF 100920, 04009 Leipzig, Germany
Abstract
A recently introduced canonical divergence for a dual structure on a smooth manifold is discussed in connection to other divergence functions. Finally, open problems concerning symmetry properties are outlined.
pacs:
Classical differential geometry (02.40.Hw), Riemannian geometries (02.40.Ky), Information Geometry.
I Introduction
The geometrical structure induced by a divergence function (or contrast function) on a smooth manifold provides a unified approach to measurement of notions as information, energy, entropy, playing an important role in mathematical sciences to research random phenomena Eguchi92 . In the mathematical formulation, a divergence function on a smooth manifold is defined by the first requirement for a distance:
[TABLE]
An important example of a divergence function is given by the Kullback-Leibler divergence in the context that and are the vectors of probabilities of disjoint events Eguchi85 , namely
[TABLE]
is a function on the -simplex . Given a smooth -dimensional manifold , we assume that is a -differentiable function. Working with the local coordinates and at and , respectively, it follows from Eq. (1) that
[TABLE]
where and . Moreover, under the assumption that
[TABLE]
we can see that the manifold is endowed, through the divergence function , with the Riemannian metric tensor given by , where the Einstein notation is adopted. The symmetry of immediately follows from the requirement that is a function on .
From Eq. (2) we can see that, in general, a divergence function is not symmetric. The asymmetry of leads to two different affine connections, and , on such that is the Levi-Civita connection with respect to the metric tensor defined by Eq. (5). More precisely, working with the local coordinates and , we can define the symbols and of the connections and , i.e. and , by means of the following relations
[TABLE]
To sum up, a divergence function on a smooth manifold induces a metric tensor on by Eq. (5). In addition, the divergence yields two linear torsion-free connections, and , on which are dual with respect to the metric tensor Eguchi85 :
[TABLE]
where denotes the space of vector fields on . Finally, we refer to the quadruple as a statistical manifold Ay17 .
I.1 The inverse problem within Information Geometry
The inverse problem is to find a divergence which generates a given geometrical structure . For any such statistical manifold there exists a divergence such that Eq. (5) and Eq. (6) hold true Matumoto93 . However, this divergence is not unique and there are infinitely many divergences generating the same geometrical structure . When this structure is dually flat, namely the curvature tensors of and are null ( and ), Amari and Nagaoka introduced a canonical divergence which is a Bregman divergence Amari00 . The canonical divergence has nice properties such as the generalized Pythagorean theorem and the geodesic projection theorem and it turns out to be of uppermost importance to define a canonical divergence in the general case. A first attempt to answer this fundamental issue is provided by Ay and Amari in Ay15 where a canonical divergence for a general statistical manifold is given by using the geodesic integration of the inverse exponential map. This one is understood as a difference vector that translates to for all sufficiently close to each other.
To be more precise, the inverse exponential map supplies a generalization to of the concept of difference vector in . In detail, let , the difference between and is the vector pointing to (see side () of Fig. 1). Then, the difference between and in is provided by the inverse exponential map. In particular, given suitably close in , the difference vector from to is defined as (see () of Fig. 1)
[TABLE]
where is the -geodesic from to . Therefore, the divergence introduced in Ay15 is defined as the path integral
[TABLE]
where denotes the inner product induced by on . After elementary computation Eq. (9) reduces to,
[TABLE]
where is the -geodesic from to Ay15 . The divergence has nice properties such as the positivity and it reduces to the canonical divergence proposed by Amari and Nagaoka when the manifold is dually flat. However, if we consider definition (9) for a general path , then will be depending on . On the contrary, if the vector field is integrable, then turns out to be independent of the path from to .
II A new canonical divergence
In this article, we discuss about a novel divergence function recently introduced in Felice18 . This turns out to be a generalization of the divergence introduced by Ay and Amari. The definition of the new divergence (see below Eqs. (17), (18)) relies on an extended analysis of the intrinsic structure of the dual geometry of a general statistical manifold . In particular, we introduced a vector at by modifying the definition (8) of the difference vector . Consider such that there exist both, a unique -geodesic and a unique -geodesic , connecting to . Moreover, let , we then -parallel translate it along from to (see () of Fig. 1), and obtain
[TABLE]
(Note that corresponds to minus a difference vector.) Analogously, we introduce the dual vector of as the -parallel transport of along the -geodesic ,
[TABLE]
where denotes the exponential map of the -connection. A fundamental result obtained in Felice18 is that the sum of and is the gradient of a symmetric function that we call pseudo-norm:
[TABLE]
By letting be varying, we then obtain two vector fields whenever and are connected by a unique -geodesic and a unique -geodesic. Then, we can introduce two vector fields on an arbitrary path connecting and in the following way. Let us firstly assume that for each there exist a unique -geodesic and a unique -geodesic connecting with . Then, we define
[TABLE]
Therefore, from Eq. (13) we have that the sum
[TABLE]
is independent of the particular path from to .
At this point, we define the novel canonical divergence , and its dual function , from to by the geodesic integration of and , respectively. In particular, we have that
[TABLE]
where and are the -geodesic and the -geodesic from to , respectively.
In this manuscript we review the relation of the canonical divergence to other divergence functions in Section III. Finally, we outline in Section IV the open problems concerning the symmetry properties of .
III Comparison with previous divergence functions
Given a general statistical manifold , the basic requirement for a smooth function to be a divergence on is its consistency with the dual structure through Eqs. (5)-(6) and the positivity for all sufficiently close to each other such that . The novel canonical divergence (17) succeeds to holding these properties (see Theorem 5 in Felice18 ).
In this section, we will show that the canonical divergence can be interpreted as a generalization of the divergence introduced by Ay and Amari. Indeed, we will see that these two divergences coincide on particular classes of statistical manifolds. In order to achieve this result, we investigate some geometric properties of the vector field given by Eq. (14) aiming to split such a vector field in terms of the difference vector given in Eq. (9). To be more precise, let us refer to Fig. 2 where the -geodesic connecting with is drawn. Then, for each we can consider the -geodesic connecting with and the -geodesic connecting with . The difference vector at pointing to is given in terms of the inverse exponential map by Therefore, the opposite of can be viewed as the -parallel translation of along the -geodesic , namely Consider now the loop based at and given by first traveling from to along the -geodesic and then back from to along the reverse of the -geodesic . If lies in a sufficiently small neighborhood of , then Felice18
[TABLE]
where
[TABLE]
with and being the -parallel transport of and , respectively, from to each point of along the unique -geodesic joining them. Here, is the curvature tensor of , denotes the disc defined by the curve and , are linearly independent. In addition, within the integral denotes the -parallel translation from each point in to along the unique -geodesic segment joining them. Finally, by means of the property of the parallel transport, we obtain the following geometric relation between the vector and the opposite of the difference vector Felice18 ,
[TABLE]
By noticing that and inserting Eq. (19) into the definition (17) of , we obtain
[TABLE]
where is the divergence introduced in Ay15 and given by Eq. (10).
It is clear from Eq. (20) that particular conditions on the curvature tensor would lead to the required equivalence between and . Actually, in Information Geometry classes of statistical manifolds are characterized by the conditions on the curvature tensors of and (see for instance Refs. Amari00 , Lauritzen87 , Zhang07 ). In the Table 1 we can see the categories of statistical manifolds on which the canonical divergence reduces to the divergence introduced in Ay15 . A statistical manifold is self-dual when . Therefore, in this case becomes a Riemannian manifold endowed with the Levi-Civita connection. Hence, the vectors and coincide for all . Finally, the skew-symmetry of the curvature tensor yields the property for any . When a manifold is dually flat, it has a mutually dual affine coordinates and two potentials such that Ay15 . This claims that the canonical divergence coincides with the canonical divergence of Bregman type introduced in Amari00 by Amari and Nagaoka on dually flat manifolds. The concept of a symmetric statistical manifold, that is the information geometric analogue to a symmetric space in Riemannian geometry, was introduced in Henmi . Here, the authors employed the following conditions on the curvature tensor, and , in order to prove that their divergence function is independent of the particular path connecting any two points sufficiently close to each other. The connection between the canonical divergence and the divergence introduced by Henmi and Kobayashi is widely discussed in Felice18 .
To summarize, Tab. 1 describes, from the top to the bottom, the statistical manifolds ordered from less generality to more generality where the equivalence between and is achieved. In this view, we can consider as an extension of the divergence to the very general statistical manifold .
Since Eq. (2) we know that in general a divergence function is not symmetric in its argument. However, the symmetry property owned by the canonical divergence of Bregman type on dually flat manifolds, namely , shows the way for the further investigation about symmetry properties of in the very general context of Information Geometry.
IV Future developments towards symmetry
The target of this section would be the description of the symmetry property for any statistical manifold . To this aim, we rely on the gradient–based approach to divergence which was introduced in Ay15 and further developed in Ay17 . This approach yields the following decompositions of and in terms of the canonical divergence gradient and its dual Felice18 ,
[TABLE]
where are the and geodesics, respectively, from to for any arbitrary path connecting and .
On the other hand, by means of the theory of minimum contrast geometry by Eguchi Eguchi92 , we can show that is parallel to the tangent vector of the -geodesic starting from and is parallel to the tangent vector of the -geodesic starting from . This proves that Eqs. (21) and (22) supply orthogonal decompositions of and , respectively. To see this, let us consider the level sets of and :
[TABLE]
Then to each we can define the minimum contrast leaf of at Eguchi85 :
[TABLE]
Let us now fix . Since minimizes the set it follows that the derivative of at along any direction tangent to vanishes, namely
[TABLE]
where denotes the derivative at along the direction . Thus we have that for all and , or equivalently that the tangent space of coincides with the normal space of at (see Fig. 3 for a cross-reference). In addition, by taking derivatives at along directions normal to we have that Eguchi92
[TABLE]
where the first relation defines the second fundamental tensor with respect to the -connection. This implies that the second fundamental tensor with respect to for vanishes at . Therefore, according to the well-known Gauss formula Lee97
[TABLE]
we can see from Eq. (25) that the family of all curves which are orthogonal to the level set are all -geodesics ending at (with a suitable choice of the parameter).
Analogously, we have that the family of all curves which are orthogonal to the level set are all -geodesics ending at (with a suitable choice of the parameter).
In order to answer what the relation of and is, let us put . Then, by noticing that and repeating the same arguments as above, we can show that , where is a smooth function on . This proves that there exists a function and such that Felice18
[TABLE]
Though this relation holds for a very general statistical manifold , this result is still not satisfactory. However, in Tab. 2 we can see the classes of statistical manifolds where the relation holds. This occurs in dually flat manifolds analogously to the canonical divergence of Bregman type introduced in Amari00 . Moreover, the required symmetry also holds in the symmetric statistical manifolds, which constitutes a new result in the setting of Information Geometry Felice18 . Forthcoming investigation will address such a symmetry in the general case.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Eguchi, S.: Geometry of minimum contrast, Hiroshima Math. J. 22 , 631–647 (1992)
- 2(2) Eguchi, S.: A differential geometric approach to statistical inference on the basis of contrast functions, Hiroshima Math. J. 15 , 341–391 (1985)
- 3(3) Ay, N., Jost, J., Van Le, H., Schwachhöfer, L.: Information Geometry. 1st edn. Springer International Publishing (2017)
- 4(4) Matumoto, T. Any statistical manifold has a contrast function–On the C 3–functions taking the minimum at the diagonal of the product manifold. Hiroshima Math. J. 23 , 327–332 (1993)
- 5(5) Amari, S.-I., Nagaoka, H.: Methods of Information Geometry. Oxford University Press (2000)
- 6(6) Ay, N., Amari, S.-I.: A Novel Approach to Canonical Divergences within Information Geometry 17 , 8111-8129 (2015)
- 7(7) Felice, D., Ay, N.: Towards a canonical divergence within information geometry ar Xiv:1806.11363 [math.DG] (2018)
- 8(8) Lauritzen, S.L.: Differential Geometry in Statistical Inference. Lecture Notes-Monograph Series 10 , 163–218 (1987)
