Logarithmic divergences: geometry and interpretation of curvature
Ting-Kam Leonard Wong, Jiaowen Yang

TL;DR
This paper introduces the logarithmic $L^{(eta)}$-divergence, linking it to optimal transport and geometric structures, and interprets its curvature in the context of statistical manifolds with constant sectional curvature.
Contribution
It establishes the geometric equivalence of the logarithmic divergence to conformal transformations and affine immersions, revealing its role as a canonical divergence in curved statistical manifolds.
Findings
Logarithmic divergence is equivalent to a conformal transformation of Bregman divergence.
The divergence corresponds to a statistical manifold with constant sectional curvature.
Provides a geometric interpretation of curvature in terms of primal and dual geodesics.
Abstract
We study the logarithmic -divergence which extrapolates the Bregman divergence and corresponds to solutions to novel optimal transport problems. We show that this logarithmic divergence is equivalent to a conformal transformation of the Bregman divergence, and, via an explicit affine immersion, is equivalent to Kurose's geometric divergence. In particular, the -divergence is a canonical divergence of a statistical manifold with constant sectional curvature . For such a manifold, we give a geometric interpretation of its sectional curvature in terms of how the divergence between a pair of primal and dual geodesics differ from the dually flat case. Further results can be found in our follow-up paper [27] which uncovers a novel relation between optimal transport and information geometry.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Geometric Analysis and Curvature Flows · Markov Chains and Monte Carlo Methods
11institutetext: Department of Statistical Sciences, University of Toronto
11email: [email protected] 22institutetext: Department of Mathematics, University of Southern California
22email: [email protected]
Logarithmic divergences: geometry and interpretation of curvature
Ting-Kam Leonard Wong 11
Jiaowen Yang 22
Abstract
We study the logarithmic -divergence which extrapolates the Bregman divergence and corresponds to solutions to novel optimal transport problems. We show that this logarithmic divergence is equivalent to a conformal transformation of the Bregman divergence, and, via an explicit affine immersion, is equivalent to Kurose’s geometric divergence. In particular, the -divergence is a canonical divergence of a statistical manifold with constant sectional curvature . For such a manifold, we give a geometric interpretation of its sectional curvature in terms of how the divergence between a pair of primal and dual geodesics differ from the dually flat case. Further results can be found in our follow-up paper [27] which uncovers a novel relation between optimal transport and information geometry.
Keywords:
Logarithmic divergence Bregman divergence Conformal divergence Affine immersion Constant sectional curvature Optimal transport
1 Introduction
Let be an open convex set, . For fixed, we say that a function is -exponentially concave if is concave on . All functions in this paper are assumed to be smooth. Given such a function , we define its -divergence by
[TABLE]
where is the Euclidean gradient and is the dot product. We always assume the Hessian is strictly negative definite on . Then is a divergence on , regarded as a manifold, in the sense of [1, Definition 1.1]. As , the -divergence (with fixed) converges to the Bregman divergence defined by
[TABLE]
where is convex with . Thus the family of logarithmic divergences extrapolate the Bregman divergence .
Originally motivated by applications in stochastic portfolio theory [7], the -divergence (and its extension to the -divergence) was introduced by Pal and the first author in [19] [26] and was studied further in [24] [20] [25]. There are two main results proved in these papers. First, the -divergence corresponds to the solution to an optimal transport problem with a logarithmic cost function; this is formulated using the general framework of -divergence, see [20] [25] [27]. Also see [21] [9] [18] for recent results about the optimal transport problem which have independent mathematical interest. Second, the induced statistical manifold (see [1, Section 6.2] for the definition) is dually projectively flat with constant sectional curvature . In [26] we also defined an -divergence corresponding to constant positive sectional curvature . For expositional simplicity we only consider the -divergence in this paper and [27], but similar results hold for the -case as well.
In this paper we develop two geometric aspects of the logarithmic divergence. First, we connect the -divergence with classical topics in information geometry, namely conformal transformation and affine differential geometry. In particular, by using an explicit affine immersion, we show that the -divergence is equivalent to the canonical geometric divergence constructed by Kurose [11]. Second, we provide a geometric interpretation of the sectional curvature for a statistical manifold with constant negative sectional curvature. By analyzing a canonical divergence between a pair of primal and dual geodesics, we show that the sectional curvature can be quantified in terms of the deviation from the generalized Pythagorean relation of a dually flat manifold (see Theorem 4.1 below). This extends the geometric interpretation of sectional curvature in Riemannian geometry. In our follow-up work [27] we proved a more general result (see [27, Theorem 3.13]) that holds for any divergence (though it is not intrinsic in the information geometric sense). This was achieved by a novel relation between information geometry and the pseudo-Riemannian framework of Kim and McCann [10] concerning the Ma-Trudinger-Wang tensor in optimal transport.
2 Conformal divergence and its geometry
We refer the reader to [1] for general background in information geometry. Conformal transformations of divergence have been studied in the literature; see for example [17] [12] [2] [15] and the references therein. An important application is robust clustering [23] [14].
Definition 1
Let be convex (with ) and let . We define the (left-sided) conformal transformation of the Bregman divergence by
[TABLE]
To abbreviate we call a conformal divergence.
Note that a right-sided conformal transformation can be converted to a left-sided one by considering the convex conjugate of (see [1, p.17]).
Our first result is that the -divergence is, up to a monotone transformation, equal to a conformal transformation of a Bregman divergence. This shows that the geometry induced by the -divergence can be studied using results of Bregman divergence and conformal transformation.
Theorem 2.1
Consider an -divergence on as in (1). Let which is convex and let . Then, with , we have
[TABLE]
In particular, the conformal divergence induces the same dualistic structure as that of .
Proof
The identity (4), once conceived, can be verified by a straightforward computation. The second statement is a consequence of the following lemma which can be proved again by a computation. Note that similar reasonings are used in [25, Lemma 3] and [25, Theorem 17]. ∎
Lemma 1
Let and be divergences related by a monotone transformation: , where is strictly increasing with . Let and be respectively the dualistic structures induced by and . Then, in any local coordinate system, the coefficients of the dualistic structures are related by
[TABLE]
In particular, we have and , and the primal and dual curvature tensors are the same.
Remark 1
By Lemma 1, we say that two divergences and are equivalent if there exists (as in Lemma 1 with ) such that . Clearly this defines an equivalence relation among divergences on a manifold. Theorem 2.1 thus states that the -divergence is equivalent to a conformal divergence.
Theorem 2.1 motivates us to study conformal divergences in general. Recall that two torsion-free affine connections and are projectively equivalent if there exists a -form such that
[TABLE]
for any vector fields and . For its geometric interpretation see [16, p.17]. In particular, and have the same geodesics up to time reparameterizations. By definition, is projectively flat if it is projectively equivalent to a flat connection. When considering the -divergence or a conformal divergence, we think of (equal to as a set) as a manifold, and is the primal (global) coordinate system with values in the convex set .
Proposition 1
Let be the statistical manifold induced by a conformal divergence .
- (i)
The primal connection is projectively flat and the primal geodesics are, up to time reparameterization, straightlines in the -coordinate system. (In fact, using the language of **[3, Section 8.4]**, is -conformally flat and is -conformally flat.)
- (ii)
* has constant sectional curvature with respect to if and only if*
[TABLE]
for some real constants and . In this case, the dual sectional curvature is also constant and is equal to .
Remark 2
Note that if (6) holds then one may absorb the linear terms in the definition of . On the other hand, we observe that if then is concave. Since on there are no non-trivial positive concave functions, from (6) we see that if the sectional curvature is constant and negative, the domain must be a proper subset of .
Proof (of Proposition 1)
Consider the dualistic structure induced by the conformal divergence. Consider the Euclidean coordinate on . By a direct computation, the coefficients of and are given by
[TABLE]
Since , the -form is well-defined. From (7), we have that we is the Euclidean flat connection on . Thus is projectively flat and we have (i). A further computation shows that
[TABLE]
Using (8), we see that has constant sectional curvature with respect to (see [25, Definition 12]) if and only if
[TABLE]
which is equivalent to (6) after integration. ∎
3 Realization by affine immersion
Consider a statistical manifold . In [25, Theorem 18] we proved that if both and are dually projectively flat with constant sectional curvature , then one can define intrinsically a local divergence of -type which induces the given geometric structure. In this result, a key idea is that the primal and dual coordinates are related by an optimal transport map (this leads to the self-dual representation given by (21) below). In fact, by [3, Theorem 8.3], if a statistical manifold has constant sectional curvature, then we automatically have dual projective flatness. Thus the condition about projective flatness is redundant and we may modify the statement as follows:
Theorem 3.1
[25, Theorem 18]** The -divergence is a (local) intrinsic divergence for a statistical manifold with constant negative sectional curvature.
On the other hand, for a (simply connected) statistical manifold with constant sectional curvature, Kurose [11] defined globally a canonical, intrinsic divergence using affine differential geometry and proved that it satisfies a generalized Pythagorean theorem. In this section we show that if is induced by an -divergence , then the geometric divergence is the conformal divergence in (4). While these canonical divergences are equivalent, our approach in [20] [25] gives an explicit construction in Kurose’s work, covers the Bregman and -divergences under the same framework, and suggests previously unknown connections with optimal transport maps.
To state the main result we recall some concepts of affine differential geometry; for details see [16] and [13]. Let be an -dimensional manifold. An affine immersion of into consists of an immersion and a transversal vector field with values in on . The last statement means that
[TABLE]
for all . Let be the standard (flat) affine connection on . Then the covariant derivative decomposes as
[TABLE]
We call and the induced connection and bilinear form respectively. If the induced connection and bilinear form are equal to the Riemannian metric and primal connection of a dualistic structure , we say that the affine immersion realizes the given structure. By [13, Theorem 5.3], this is possible when the statistical manifold is simply connected and -conformally flat. This is true in particular when the statistical manifold has constant sectional curvature.
Let be the dual space of , and let be the dual pairing. Given an affine immersion , the conormal vector field is defined by the conditions
[TABLE]
Definition 2 (Kurose’s geometric divergence)
For an affine immersion with conormal field , the geometric divergence is defined by
[TABLE]
In [11] it was shown that if is -conformally flat, then the geometric divergence does not depend on the choice of the immersion and recovers the given dualistic structure. (The dual connection is uniquely determined given and .) Hence, it can be viewed as a canonical divergence (see the next section for more discussion).
The following result connects the -divergence with the geometric divergence. It shows that the geometric divergence, the -divergence and the conformal divergence are all equivalent. In particular, they are all intrinsically defined (at least locally) for the given dualistic structure.
Theorem 3.2
Consider a convex domain equipped with an -divergence and its induced geometry . Let and as in Theorem 2.1. Consider the affine immersion defined by
[TABLE]
where is the Euclidean coordinate system on . Then this affine immersion realizes . Moreover, the geometric divergence is given by
[TABLE]
Proof
The choice of our immersion (12) is motivated by the proof of [16, Proposition 2.7]. It is easy to see that is an immersion and is transversal. Let and . Then, it can be verified by a straightforward computation that
[TABLE]
We refer the reader to [25, Section 5] for expressions of the coefficients . Thus the affine immersion realizes the given dualistic structure.
Next we construct the conormal vector field. Using the relations in (10), we can show that the conormal field is given by
[TABLE]
We obtain (13) by plugging (15) into (13).
4 Interpretation of sectional curvature
Consider a statistical manifold . Given and which are linearly independent, we can define the primal sectional curvature by
[TABLE]
where is the Riemannian inner product and is the primal curvature tensor. Similarly, we can define the dual sectional curvature . What are the geometric interpretations of these sectional curvatures? Interestingly, to the best of our knowledge, this natural question has not been satisfactorily answered in the literature.
For motivations, let us consider a Riemannian manifold . In this case, it is well-known that the sectional curvature (given by (16) using the Levi-Civita connection) can be interpreted in terms of the Riemannian distance, defined by
[TABLE]
between a pair of geodesics. For small, let and be geodesics starting at , where is the exponential map. Then, we have
[TABLE]
where the higher order terms are omitted (see [22]). This implies that
[TABLE]
We look for analogous geometric interpretations for a statistical manifold. Given a statistical manifold , in order to formulate a statement in the form of (18) or (19), we need to have an intrinsically defined divergence corresponding to the given geometry. This is the problem about constructing a canonical divergence and was studied by several papers including [8] [4] [6] [5].
Using the -divergence which is explicit, intrinsically defined and has special properties, in this section we study the geometric interpretation for a statistical manifold with constant sectional curvature . Let and . Motivated by the generalized Pythagorean theorem which holds for the Bregman and -divergences, let
[TABLE]
where and are respectively the exponential maps corresponding respectively to the primal and dual connections and . With being an intrinsic local -divergence (see Theorem 3.1), consider the expression defined by
[TABLE]
By the generalized Pythagorean theorem proved in [20, Theorem 1.2] and [25, Theorem 16], if then . This motivates the definition of and the comparison with (19). Note that if then the manifold is dually flat. In this case, there is a canonical divergence of Bregman type. With the Bregman divergence and with defined by (20), we have the identity .
Now let and let be the canonical (local) -divergence. By [25, Theorem 18], there exists a local coordinate system and an -exponentially concave function such that . Here is the primal coordinate of . Moreover, letting
[TABLE]
be respectively the dual coordinate and -conjugate of , we have and the self-dual representation
[TABLE]
As , these identities reduce to well-known properties of the Bregman divergence [1, Chapter 1]. By analyzing carefully the primal and dual geodesics as well as the self-dual representation (21), we have the following result.
Theorem 4.1
For small, we have
[TABLE]
Proof
By [25, Corollary 2], the primal/dual geodesics of -divergence are straight lines in the primal/dual coordinate systems, up to time changes. Thus we can write and , where and are the coordinate representations of and , and and are time changes. For notational simplicity we suppress the parameters and . Using [25, (89)], we have
[TABLE]
Differentiating (21) and using (23), we expand in terms of and :
[TABLE]
where , , and .
On the other hand, the geodesic equations (see [25, (86)]) give us, after some simplifications, Taylor expansions of and :
[TABLE]
[TABLE]
where and . The proof is completed by combining (24), (25) and (26). ∎
This result gives a geometric interpretation of the negative sectional curvature in terms of the canonical local -divergence . Note that if we use another intrinsic divergence (such as the conformal divergence) we will get a different expression in (22). Analogous results can be derived for the -divergence.
Note that Theorem 4.1 implies that , so the sectional curvature may be interpreted in terms of this fourth order mixed derivative. In [27, Theorem 3.13] we extended this result to any divergence. This is formulated using a novel connection between the information geometry of -divergence (which covers all divergences) and the pseudo-Riemannian framework of Kim and McCann [10]. In particular, for any divergence , the mixed derivative is equal to times an un-normalized cross curvature of the Kim-McCann metric induced by the cost function. The reader is referred to [27] for more details. To conclude this paper, let us remark that for a statistical manifold with non-constant sectional curvature, this cross sectional curvature is not intrinsic as there are different divergences (and hence Kim-McCann metrics) which induce the same dualistic structure. A natural starting point is to analyze the canonical divergence of Ay and Amari constructed in [4]. We leave this as a problem for future research.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Shun-ichi Amari. Information Geometry and Its Applications . Springer, 2016.
- 2[2] Shun-ichi Amari and Andrzej Cichocki. Information geometry of divergence functions. Bulletin of the Polish Academy of Sciences: Technical Sciences , 58(1):183–195, 2010.
- 3[3] Shun-ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry , volume 191. American Mathematical Society, 2000.
- 4[4] Nihat Ay and Shun-ichi Amari. A novel approach to canonical divergences within information geometry. Entropy , 17(12):8111–8129, 2015.
- 5[5] Domenico Felice and Nihat Ay. Dynamical systems induced by canonical divergence in dually flat manifolds. ar Xiv preprint ar Xiv:1812.04461 , 2018.
- 6[6] Domenico Felice and Nihat Ay. Towards a canonical divergence within information geometry. ar Xiv preprint ar Xiv:1806.11363 , 2018.
- 7[7] E Robert Fernholz. Stochastic portfolio theory. In Stochastic Portfolio Theory , pages 1–24. Springer, 2002.
- 8[8] Masayuki Henmi and Ryoichi Kobayashi. Hooke’s law in statistical manifolds and divergences. Nagoya Mathematical Journal , 159:1–24, 2000.
