Congruent families and invariant tensors
Lorenz Schwachh\"ofer, Nihat Ay, J\"urgen Jost, H\^ong V\^an L\^e

TL;DR
This paper generalizes classical invariance results in information geometry, showing that invariant tensor families under congruent Markov morphisms are generated by canonical tensors for any degree n.
Contribution
It extends the characterization of invariant tensors from 2- and 3-tensors to arbitrary degree n, linking them to canonical tensor fields.
Findings
Invariant tensor families are algebraically generated by canonical tensors.
Classical invariance results are extended to higher-degree tensors.
The work unifies the understanding of invariant tensors in statistical models.
Abstract
Classical results of Chentsov and Campbell state that -- up to constant multiples -- the only -tensor field of a statistical model which is invariant under congruent Markov morphisms is the Fisher metric and the only invariant -tensor field is the Amari-Chentsov tensor. We generalize this result for arbitrary degree , showing that any family of -tensors which is invariant under congruent Markov morphisms is algebraically generated by the canonical tensor fields defined in an earlier paper.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Congruent families and invariant tensors
Lorenz Schwachhöfer, Nihat Ay, Jürgen Jost, Hông Vân Lê
L. Schwachhöfer, TU Dortmund University, Dortmund, Germany, [email protected]
N. Ay, J. Jost, Max-Planck-Institute for Mathematics in the Sciences, Leipzig, Germany, [email protected], [email protected]
H.V. Lê, Academy of Sciences of the Czech Republic, Prague, [email protected]
Abstract.
Classical results of Chentsov and Campbell state that – up to constant multiples – the only -tensor field of a statistical model which is invariant under congruent Markov morphisms is the Fisher metric and the only invariant -tensor field is the Amari-Chentsov tensor. We generalize this result for arbitrary degree , showing that any family of -tensors which is invariant under congruent Markov morphisms is algebraically generated by the canonical tensor fields defined in [5].
Key words and phrases:
Chentsov’s theorem, sufficient statistic, congruent Markov kernel, statistical model
2010 Mathematics Subject Classification:
primary: 62B05, 62B10, 62B86, secondary: 53C99
1. Introduction
The main task of *Information geometry *is to use differential geometric methods in probability theory in order to gain insight into the structure of families of probability measures or, slightly more general, finite measures on some (finite or infinite) sample space . In fact, one of the key themes of differential geometry is to identify quantities that do not depend on how we parametrize our objects, but that depend only on their intrinsic structure. And since in information geometry, we not only have the structure of the parameter space, the classical object of differential geometry, but also the sample space on which the probability measures live, we should also look at invariance properties with respect to the latter. That is what we shall systematically do in this contribution.
When parametrizing such a family by a manifold , there are two classically known symmetric tensor fields on the parameter space . The first is a quadratic form (i.e., a Riemannian metric), called the Fisher metric , and the second is a -tensor, called the Amari-Chentsov tensor . The Fisher metric was first suggested by Rao [19], followed by Jeffreys [15], Efron [14] and then systematically developed by Chentsov and Morozova [10], [11] and [18]; the Amari-Chentsov tensor and its significance was discovered by Amari [1], [2] and Chentsov [12]. If the family is given by a positive density function w.r.t. some fixed background measure on and differentiable in the -direction, then the score
[TABLE]
vanishes, while the Fisher metric and the Amari-Chentsov tensor associated to a parametrized measure model are given by
[TABLE]
Of course, this naturally suggests to consider analogous tensors for arbitrary degree . The tensor fields in (1.2) have some remarkable properties. On the one hand, they may be defined independently of the particular choice of a parametrization and thus are naturally defined from the differential geometric point of view. Their most important property from the point of view of statistics is that these tensors are invariant under sufficient statistics or, more general by congruent Markov morphisms. In fact, these tensor fields are characterized by this invariance property. This was shown in the case of finite sample spaces by Chentsov in [11] and for an arbitrary sample space by the authors of the present article in [4].
The question addressed in this article is to classify all tensor fields which are invariant under sufficient statistics and congruent Markov morphisms. In order to do this, we first have to make this invariance condition precise.
Observe that both [11] and [4] require the family to be of the form with , which in particular implies that all these measures are equivalent, i.e., have the same null sets. Later, in [5] and [6], the authors of this article introduced a more general notion of a *parametrized measure model *as a map from a (finite or infinite dimensional) manifold into the space of finite measures which is continuously Fréchet-differentiable when regarded as a map into the Banach lattice of *signed * finite measures. Such a model neither requires the existence of a measure dominating all measures , nor does it require all these measures to be equivalent.
Furthermore, for each there is a well defined Banach lattice of -th powers of finite signed measures, whose nonnegative elements are denoted by , and for each integer , there is a *canonical -tensor *on given by
[TABLE]
where is a signed measure. The multiplication on the right hand side of (1.3) refers to the multiplication of roots of measures, cf. [5, (2.11)], see also (2.2). A parametrized measure model is called *-integrable *for if the map
[TABLE]
is continuously Fréchet differentiable, cf. [5, Definition 4.4]. In this case, we define the *canonical -tensor of the model *as the pull-back for all . If the model is of the form with a positive density function , then
[TABLE]
so that and by (1.2). The condition of -integrability ensures that the integral in (1.4) exists for .
A Markov kernel induces a bounded linear map , called the Markov morphism associated to . This Markov kernel is called congruent, if there is a statistic such that for all .
We may associate to the map by , where . While is not Fréchet differentiable in general, we still can define in a natural way the formal differential and hence the pullback for any covariant -tensor on which yields a covariant -tensor on .
It is not hard to show that for the canonical tensor fields we have the identity for any congruent Markov kernel , whence we may say that the canonical -tensors on form a congruent family. Evidently, any tensor field which is given by linear combinations of tensor products of canonical tensors and permutations of the argument is also a congruent family, and the families of this type are said to be algebraically generated by .
Our main result is that these exhaust the possible invariant families of covariant tensor fields:
Theorem 1.1**.**
Let be a family of covariant -tensors on for each measurable space . Then this family is invariant under congruent Markov morphisms if and only if it is algebraically generated by the canonical tensors with .
In particular, on each -integrable parametrized measure model any tensor field which is invariant under congruent Markov morphisms is algebraically generated by the canonical tensor fields , .
We shall show that this conclusion already holds if the family is invariant under congruent Markov morphisms with finite . Also, observe that this theorem yields another proof of the theorems of Chentsov [12, Theorem 11.1] and Campbell ([9] or [4]) which classify the invariant families of - and -tensors, respectively. Campbell’s theorem covers the case where the measures no longer need to be probability measure. In such a situation, the analogue of the score (1.1) no longer needs to vanish, and it furnishes a nontrivial -tensor.
Let us comment on the relation of our results to those of Bauer et al. [7] [8]. Assuming that the sample space is a manifold (with boundary or even with corners), the space of (smooth) densities on is defined as the set of all measures of the form , where is a smooth function with finite integral, being the volume form of some Riemannian metric on . Thus, is a Fréchet manifold, and regarding a diffeomorphism as a congruent statistic, the induced maps are diffeomorphisms of Fréchet manifolds. The main result in [7] states that for any -tensor field which is invariant under diffeomorphisms is a multiple of the Fisher metric. Likewise, the space of diffeomorphism invariant -tensors for arbitrary [8] is generated by the canonical tensors. Thus, when restricting to parametrized measure models whose image lies in the space of densities and which are differentiable w.r.t. the Fréchet manifold structure on , then the invariance of a tensor field under diffeomorphisms rather than under arbitrary congruent Markov morphisms already implies that the tensor field is algebraically generated by the canonical tensors. Considering invariance under diffeomorphisms is natural in the sense that they can be regarded as the natural analogues of permutations of a finite sample space. In our more general setting, however, the concept of a diffeomorphism is no longer meaningful, and we need to consider invariance under a larger class of transformations, the congruent Markov morphisms.
In a similar spirit, J. Dowty [13] has shown recently that when restricting to the space of exponential families, the Fisher metric is the only -tensor which is invariant under independent and identically distributed extensions and canonical sufficient statistics.
This paper is structured as follows. In Section 2 we recall from [5] the definition of a parametrized measure model, roots of measures and congruent Markov kernels, and furthermore we give an explicit description of the space of covariant families which are algebraically generated by the canonical tensors. In Section 3 we recall the notion of congruent families of tensor fields and show that the canonical tensors and hence tensors which are algebraically generated by these are congruent. Then we show that these exhaust all invariant families of tensor field on *finite *sample spaces in Section 4, and finally, in Section 5, by reducing the general case to the finite case through step function approximations, we obtain the classification result Theorem 5.1 which implies Theorem 1.1 as a simplified version.
Acknowledgements. This work was mainly carried out at the Max Planck Institute for Mathematics in the Sciences in Leipzig, and we are grateful for the excellent working conditions provided at that institution. H.V. Lê is partially supported by Grant RVO:67985840.
2. Preliminary results
2.1. The space of (signed) finite measures and their powers
Let be a measurable space, that is an arbitrary set together with a sigma algebra of subsets of . Regarding the sigma algebra on as fixed, we let
[TABLE]
Clearly, , and are real vector spaces, whereas is an affine space with linear part . In fact, both and are Banach spaces whose norm is given by the total variation of a signed measure, defined as
[TABLE]
where the supremum is taken over all finite partitions with disjoint sets . Here, the symbol stands for the disjoint union of sets. In particular,
[TABLE]
In [5], for each the space of -th powers of measures on is defined. We shall not repeat the formal definition here, but we recall the most important features of these spaces.
Each is a Banach lattice whose norm we denote by , and denotes the spaces of nonnegative elements. Moreover, in a canonical way. For there is a bilinear product
[TABLE]
and for there is a exponentiating map which is continuous for and a Fréchet--map for .
In order to understand these objects more concretely, let be a measure, so that . Then for all we have , and if and only if . The inclusion
[TABLE]
is an isometric inclusion of Banach spaces, and the elements of are said to be dominated by . We also define
[TABLE]
Moreover,
[TABLE]
where and . The Fréchet derivative of at is given by
[TABLE]
Furthermore, for an integer , we have the canonical -tensor on , given by
[TABLE]
which is a symmetric -multilinear form, where we regard the product as an element of . For instance, for the bilinear form
[TABLE]
equips with a Hilbert space structure with induced norm .
2.2. Parametrized measure models
Recall from [5] that a parametrized measure model is a triple consisting of a (finite or infinite dimensional) manifold and a map which is Fréchet-differentiable when regarded as a map into (cf. [5, Definition 4.1]). If for all , then is called a statistical model. Moreover, is called -integrable, if is also Fréchet integrable (cf. [16, Definition 2.6]). For a parametrized measure model, the differential with is always dominated by , and we define the logarithmic derivative (cf. [5, Definition 4.3]) as the Radon-Nikodym derivative
[TABLE]
Then is -integrable if and only if for all , and the function on is continuous (cf. [16, Theorem 2.7]). In this case, the Fréchet derivative of is given as
[TABLE]
2.3. Congruent Markov morphisms
Definition 2.1**.**
A *Markov kernel * between two measurable spaces and is a map associating to each a probability measure on such that for each fixed measurable the map
[TABLE]
is measurable for all . The linear map
[TABLE]
is called the Markov morphism induced by .
Evidently, a Markov morphism maps to , and
[TABLE]
so that also maps to . For any , , whence is bounded.
Example 2.1**.**
A measurable map , called a statistic, induces a Markov kernel by setting . In this case,
[TABLE]
whence is the push-forward of (signed) measures on to (signed) measures on .
Definition 2.2**.**
A Markov kernel is called congruent w.r.t. to the statistic if
[TABLE]
or, equivalently, if is a right inverse of , i.e., . It is called *congruent *if it is congruent w.r.t. some statistic .
This notion was introduced by Chentsov in the case of finite sample spaces [12], but the natural generalization in Definition 2.2 to arbitrary sample spaces has been treated in [4], [5] and [17].
Example 2.2**.**
A statistic between finite sets induces a partition
[TABLE]
In this case, a Markov kernel is -congruent of and only of
[TABLE]
If is a parametrized measure model and a Markov kernel, then with is again a parametrized measure model. In this case, we have the following result.
Proposition 2.1**.**
([5, Theorem 3.3]) Let be a Markov morphism induced by the Markov kernel , let be a -integrable parametrized measure model and . Then is also -integrable, and
[TABLE]
2.4. Tensor algebras
In this section we shall provide the algebraic background on tensor algebras. Let be a vector space over a commutative field , and let be its dual. The tensor algebra of is defined as
[TABLE]
where
[TABLE]
In particular, and . . Then is a graded associative unital algebra, where the product is defined as
[TABLE]
By convention, the multiplication with elements of is the scalar multiplication, so that is the unit of . Observe that is non-commutative.
There is a linear action of , the permutation group of elements, on given by
[TABLE]
for and . Indeed, the identity is easily verified. We call a tensor symmetric, if for all , and we let
[TABLE]
the -fold symmetric power of . Evidently, is a linear subspace.
A unital subalgebra of is a linear subspace containing which is closed under tensor products, i.e. such that implies that . We call such a subalgebra graded if
[TABLE]
and a graded subalgebra is called *permutation invariant *if is preserved by the action of on .
Definition 2.3**.**
Let be an arbitrary subset. The intersection of all permutation invariant unital subalgebras of containing is called the permutation invariant subalgebra generated by and is denoted by .
Observe that is the smallest permutation invariant unital subalgebra of which contains .
Example 2.3**.**
Evidently, .
To see another example, let . If we let and {\mathcal{A}}_{n}:={\mathbb{F}}(\underbrace{\tau^{1}\otimes\cdots\otimes\tau^{1}}_{\text{n times}}) for , then . In fact, is even commutative and isomorphic to the algebra of polynomials over in one variable.
For , we denote by the collection of partitions of , that is, , and these sets are pairwise disjoint. We denote the number of sets in the partition by .
Given a partition , we associate to it a bijective map
[TABLE]
where , such that . This map is well defined, up to permutation of the elements in .
is partially ordered by the relation if is a subdivision of . This ordering has the partition into singleton sets as its minimum and as its maximum.
Consider now a subset of of the form
[TABLE]
For a partition with the associated map from (2.13) we define as
[TABLE]
Observe that this definition is independent of the choice of the bijection , since is symmetric.
Example 2.4**.**
- (1)
If is the trivial partition, then
[TABLE] 2. (2)
If is the partition into singletons, then
[TABLE] 3. (3)
To give a concrete example, let and . Then
[TABLE]
We can now present the main result of this section.
Proposition 2.2**.**
Let be given as in (2.14). Then the permutation invariant subalgebra generated by equals
[TABLE]
Proof.
Let us denote the right hand side of (2.16) by , so that we wish to show that .
By Example 2.4.1, for all , whence . Furthermore, by (2.15) we have
[TABLE]
where is the partition of obtained by regarding and as partitions of and , respectively. Moreover, if is a permutation and a partition, then the definition in (2.15) implies that
[TABLE]
That is, is a permutation invariant unital subalgebra of containg , whence .
For the converse, observe that for a partition , we may – after applying a permutation of – assume that
[TABLE]
with , and in this case, (2.11) and (2.15) implies that
[TABLE]
so that any permutation invariant subalgebra containing also must contain for all partitions, and this shows that . ∎
2.5. Tensor fields
Recall that a (covariant) -tensor field111Since we do not consider non-covariant -tensor fields in this paper, we shall suppress the attribute covariant. on a manifold is a collection of -multilinear forms on for all such that for continuous vector fields on the function
[TABLE]
is continuous. This notion can also be adapted to the case where has a weaker structre than that of a manifold. The examples we have in mind are the subsets of for an arbitrary measurable space and , which fail to be manifolds. Nevertheless, there is a natural notion of tangent cone at of these sets which is the collection of the derivatives of all curves in (in , respectively) through . These cones were determined in [5, Proposition 2.1] as
[TABLE]
with (, respectively). Then in analogy to the notion for general manifolds, we can now define the notion of -tensor field on and as follows.
Definition 2.4**.**
Let be a measurable space and . A *vector field on *is a continuous map such that for all . The notion of a vector field on is defined analogously.
A *(covariant) -tensor field on *is a collection of -multilinear forms on for all such that for continuous vector fields on the function
[TABLE]
is continuous. The notion of vector fields and -tensor fields on is defined analogously.
If are tensor fields of degree and , respectively, and is a permutation, then the pointwise tensor product and the permutation defined in (2.11) and (2.12) are tensor fields of degree and , respectively. Moreover, for a differentiable map the *pull-back of under *is the tensor field on defined by
[TABLE]
Evidently, we have
[TABLE]
For instance, if is a -integrable parametrized measure model, then by (2.7), , so that for any -tensor field on the pull-back
[TABLE]
is well defined. The same holds if is a statistical model and is an -tensor field on . Moreover, (2.18) holds in this context as well when replacing by .
Definition 2.5**.**
Let be a measurable space, an integer and . Then canonical -tensor field on is defined as the pull-back
[TABLE]
with the symmetric -tensor on defined in (2.5). The definition of the pullback in (2.17) and the formula for the Fréchet-derivative of in (2.4) now imply by a straightforward calculation that
[TABLE]
where and .
Furthermore, if is a -integrable parametrized measure model, , then we define the canonical -tensor field of as the pull-back
[TABLE]
In this case, (2.7) implies that for
[TABLE]
Example 2.5**.**
- (1)
The canonical -tensor of is given as
[TABLE]
Thus, on a statistical model (i.e., if for all ) . 2. (2)
The canonical -tensor is called the *Fisher metric *of the model and is often simply denoted by . It is defined only if the model is -integrable. 3. (3)
The canonical -tensor is called the *Amari-Chentsov tensor *of the model. It is often simply denoted by and is defined only if the model is -integrable.
3. Congruent families of tensor fields
The question we wish to address in this section is to characterize families of -tensor fields on (on , respectively) for measurable spaces which are unchanged under congruent Markov morphisms.
First of all, we need to clarify what is meant by this. The problem we have is that a given Markov kernel induces the bounded linear Markov morphism which maps and to and , respectively, there is no induced differentiable map from and to and , respectively, if . The best we can do is to make the following definition.
Definition 3.1**.**
Let be a Markov kernel with the associated Markov morphism from (2.8). For we define
[TABLE]
which maps and to and , respectively.
Since , it follows that is a Fréchet--map and is linear. However, is continuous but not differentiable for , whence the same holds for .
Nevertheless, let us for the moment pretend that was differentiable. Then, when rewriting (3.1) as , the chain rule and (2.7) would imply that
[TABLE]
for all .
On the other hand, as maps to , its differential at for would restrict to a linear map
[TABLE]
where . This together with (3.2) implies that the restriction of to must be given as
[TABLE]
Indeed, by [5, Theorem 3.3], (3.3) defines a bounded linear map . In fact, it is shown in that reference that
[TABLE]
Definition 3.2**.**
For , the bounded linear map (3.3) is called the formal derivative of at .
If is a -integrable parametrized measure model, then so is with by Proposition 2.1. In this case, we may also write
[TABLE]
Proposition 3.1**.**
The formal derivative of defined in (3.3) satisfies the identity
[TABLE]
for all which may be regarded as the chain rule applied to the derivative of (3.4).
Proof.
For , we calculate
[TABLE]
which shows the assertion. ∎
Our definition of formal derivatives is just strong enough to define the pullback of tensor fields on the space of probability measures in analogy to (2.17).
Definition 3.3** (Pullback of tensors by a Markov morphism).**
Let be a Markov kernel, and let be an -tensor field on (on , respectively), cf. Definition 2.4. Then the pull-back tensor under is defined as the covariant -tensor on (on , respectively) given as
[TABLE]
with the formal derivative from (3.3).
Evidently, is again a covariant -tensor on and , respectively, since is continuous. Moreover, Proposition 3.1 implies that for a parametrized measure model and the induced model with we have the identity
[TABLE]
for any covariant -tensor field on or , respectively.
With this, we can now give a definition of congruent families of tensor fields.
Definition 3.4** (Congruent families of tensors).**
Let , and let be a collection of covariant -tensors on (on , respectively) for each measurable space .
This collection is said to be a *congruent family of -tensors of regularity *if for any congruent Markov kernel we have
[TABLE]
The following gives an important example of such families.
Proposition 3.2**.**
The restriction of the canonical -tensors (2.5) to and , respectively, yield a congruent family of -tensors. Likewise, then canonical -tensors on and , respectively, with yield congruent families of -tensors.
Proof.
Let be a Markov kernel which is congruent w.r.t. the statistic (cf. Definition 2.2). For let , so that . Let , with , , and define by
[TABLE]
By the -congruency of , this implies that
[TABLE]
where , so that
[TABLE]
Then
[TABLE]
This shows that is a congruent family of -tensors. For , observe that by (3.1) we have
[TABLE]
and hence,
[TABLE]
showing the congruency of the family as well. ∎
By (2.18) and Definition 3.4, it follows that tensor products and permutations of congruent families of tensors yield again such families. Moreover, since
[TABLE]
multiplying a congruent family with a continuous function depending only on yields again a congruent family of tensors. Therefore, defining for a partition with the associated map from (2.13) the tensor as
[TABLE]
this together with Proposition 2.2 yields the following.
Proposition 3.3**.**
For ,
[TABLE]
is a congruent family of -tensor fields on , where the sum is taken over all partitions with for all , and where are continuous functions. Furthermore,
[TABLE]
is a congruent family of -tensor fields on , where the sum is taken over all partitions with for all , and where the are constants.
In the light of Proposition 2.2, it is reasonable to use the following terminology.
Definition 3.5**.**
The congruent families of -tensors on and given in (3.7) and (3.8), respectively, are called the families which are algebraically generated by the canonical tensors.
4. Congruent families on finite sample spaces
In this section, we wish to apply our discussion of the previous sections to the case where the sample space is assumed to be a finite set, in which case it is denoted by rather than .
The simplification of this case is due to the fact that in this case the spaces are finite dimensional. Indeed, we have
[TABLE]
where denotes the Dirac measure supported at . The norm on is then
[TABLE]
The space is then given as
[TABLE]
The sets and are manifolds of dimension and , respectively. Indeed, is an open subset, whereas is an open subset of the affine hyperplane , cf (2.1). In particular, we have
[TABLE]
The norm on is given as
[TABLE]
and the product and the exponentiating map from above are given as
[TABLE]
Evidently, maps and to and , respectively, and the restriction of to these sets is differentiable even if .
A Markov kernel between the finite sets and is determined by the -Matrix by
[TABLE]
where and for all . Therefore, by linearity,
[TABLE]
In particular, and .
If is a statistic between finite sets (cf. Example 2.2) and if we denote the induced partition by , then a Markov kernel given by the matrix as above is -congruent if and only if
[TABLE]
Since is a basis of , we can describe any -tensor on by defining for all multiindices the component functions
[TABLE]
which are real valued functions depending continuously on . Thus, by (2.20), the component functions of the canonical tensor from (4.4) are given as
[TABLE]
Remark 4.1**.**
Observe that is continuous on and hence is well-defined on even if , as on this set . This reflects the fact that the restriction is differentiable for any by (4.3).
In particular, for , when restricting to or , the canonical tensor fields
[TABLE]
yield a congruent family of -tensors on and , respectively, as is verified as in the proof of Proposition 3.2. Therefore, the families of -tensor fields
[TABLE]
on and
[TABLE]
on are congruent, where in contrast to (3.7) and (3.8) we need not restrict the sum to partitions with for all . In analogy to Definition 3.5 we call these the families of congruent tensors algebraically generated by the canonical -tensors .
The main result of this section (Theorem 4.1) will be that (4.6) and (4.7) are the only families of congruent -tensor fields which are defined on and , respectively, for all *finite *sets . In order to do this, we first deal with congruent families on only.
A multiindex induces a partition of the set into the equivalence classes of the relation . For instance, for and pairwise distinct elements , the partition induced by is
[TABLE]
Since the canonical -tensors are symmetric by definition, it follows that for any partition we have by (3.6)
[TABLE]
Lemma 4.1**.**
In (4.6) and (4.7) above, and are uniquely determined.
Proof.
To show the first statement, let us assume that there are functions such that
[TABLE]
for all finite sets and , but there is a partition with . In fact, we pick to be minimal with this property, and choose a multiindex with . Then
[TABLE]
where the last equation follows since for by the minimality assumption on .
But again by (4.8), since , so that for all , contradicting .
Thus, (4.9) occurs only if for all , showing the uniqueness of the functions in (4.6).
The uniqueness of the constants in (4.7) follows similarly, but we have to account for the fact that . In order to get around this, let be a finite set and . For , we define
[TABLE]
and for a multiindex we let
[TABLE]
Multiplying this term out, we see that is a linear combination of terms of the form , where . Thus, from (4.8) we conclude that
[TABLE]
Moreover, if with , and , then
[TABLE]
Thus, by (2.15) we have
[TABLE]
In particular, since for all we conclude that
[TABLE]
as long as does not contain singleton set.
With this, we can now proceed as in the previous case: assume that
[TABLE]
for constants which do not all vanish, and we let be minimal with . Let be a multiindex with , and let be as above. Then
[TABLE]
where the last equality follows by the assumption that is minimal. But by (4.11), whence , contradicting the choice of .
This shows that (4.12) can happen only if all , and this completes the proof. ∎
The main result of this section is the following.
Theorem 4.1**.**
(Classification of congruent families of -tensors)
The class of congruent families of -tensors on and , respectively, for finite sets is the class algebraically generated by the canonical -tensors . That is, these families are the ones given in (4.6) and (4.7), respectively.
The rest of this section will be devoted to its proof which is split up into several lemmas.
Lemma 4.2**.**
Let be the canonical -tensor from Definition 3.6, and define the center
[TABLE]
Then for any we have
[TABLE]
Proof.
For , , the components of all equal , whence in this case we have for all multiindices with
[TABLE]
showing (4.14). If , the claim follows from (4.8). ∎
Now let us suppose that is a congruent family of -tensors on , and define as in (4.4) and as in (4.13).
Lemma 4.3**.**
Let and be as before, and let . If are multiindices with , then
[TABLE]
Proof.
If , then there is a permutation such that for . We define the congruent Markov kernel by . Then evidently, , and Definition 3.4 implies
[TABLE]
which shows the claim. ∎
By virtue of this lemma, we may define
[TABLE]
Lemma 4.4**.**
Let and be as before, and suppose that is a partition such that
[TABLE]
Then there is a continuous function such that
[TABLE]
Proof.
Let be finite sets, and let . We define the Markov kernel
[TABLE]
which is congruent w.r.t. the canonical projecton . Then is easily verified. Moreover, if is a multiindex with , then
[TABLE]
Observe that . If , then by (4.15).
Moreover, there are multiindices for which , and since for all of these , we obtain
[TABLE]
and since , it follows that
[TABLE]
Interchanging the roles of and in the previous arguments, we also get
[TABLE]
whence is indeed independent of the choice of the finite set . ∎
Lemma 4.5**.**
Let and be as before. Then there is a congruent family of the form (4.6) such that
[TABLE]
Proof.
For a congruent family of -tensors , we define
[TABLE]
If , then let
[TABLE]
be a minimal element, i.e., such that for all . In particular, for this partition (4.15) and hence (4.16) holds. Let
[TABLE]
with the function from (4.16). Then is again a family of -tensors.
Let and be a multiindex with . If , then by Lemma 4.2 we would have which would imply that , contradicting the choice of .
Thus, and hence whenever , showing that .
Thus, what we have shown is that . On the other hand, if , then again by Lemma 4.2
[TABLE]
and since , it follows that
[TABLE]
That is, whenever . If is a multiindex with , then by the minimality of , so that . Moreover, by Lemma 4.2, whence
[TABLE]
showing that . Therefore,
[TABLE]
What we have shown is that given a congruent family of -tensors with , we can enlarge by subtracting a multiple of the canonical tensor of some partition. Repeating this finitely many times, we conclude that for some congruent family of the form (4.6)
[TABLE]
and this implies by definition that for all and all . ∎
Lemma 4.6**.**
Let be a congruent family of -tensors such that for all and . Then for all .
Proof.
Consider such that has rational coefficients, i.e.
[TABLE]
for some and . Let
[TABLE]
so that , and consider the congruent Markov kernel
[TABLE]
Then
[TABLE]
Thus, Definition 3.4 implies
[TABLE]
so that whenever has rational coefficients. But these form a dense subset of , whence for all , which completes the proof. ∎
We are now ready to prove the main result in this section.
Proof of Theorem 4.1.
Let be a congruent family of -tensors. By Lemma 4.5 there is a congruent family of the form (4.6) such that for all finite and all .
Since is again a congruent family, Lemma 4.6 implies that and hence is of the form (4.6), showing the statement of Theorem 4.1 for -tensors on .
To show the second part, let us consider for a finite set the inclusion and projection
[TABLE]
Evidently, is a left inverse of , i.e., , and by (2.9) it follows that commutes both with and .
Thus, if is a congruent family of -tensors on , then
[TABLE]
yields a congruent families of -tensors on and by the first part of the theorem must be of the form (4.6). But then,
[TABLE]
where . Since if contains a singleton set, it follows that is of the form (4.7). ∎
5. Congruent families on arbitrary sample spaces
In this section, we wish to generalize the classification result for congruent families on finite sample spaces (Theorem 4.1) to the case of arbitrary sample spaces. As it turns out, we show that even in this case, congruent families of tensor fields are algebraically generated by the canonical tensor fileds. More precisely, we have the following result.
Theorem 5.1** (Classification of congruent families).**
For , let be a family of covariant -tensors on (on , respectively) for each measurable space . Then the following are equivalent:
- (1)
* is a congruent family of covariant -tensors of regularity .* 2. (2)
For each congruent Markov morphism for a finite set , we have . 3. (3)
* is of the form (3.7) (of the form (3.8), respectively) for uniquely determined continuous functions (constants , respectively).*
In the light of Definition 3.5, we may reformulate the equivalence of the first and the third statement as follows:
Corollary 5.1**.**
The space of congruent families of covariant -tensors on and , respectively, is algebraically generated by the canonical -tensors for .
Proof of Theorem 5.1..
We already showed in Proposition 3.3 that the tensors (3.7) and (3.8), respectively, are congruent families, hence the third statement implies the first. The first immediately implies the second by the definition of the congruency of tensors. Thus, it remains to show that the second statement implies the third.
We shall give the proof only for the families of covariant -tensors on , as the proof for families on is analogous.
Observe that for finite sets , the space is an open subset and hence a manifold, and the restrictions are diffeomorphisms not only for but for all . Thus, given the congruent family , we define for each finite set the tensor
[TABLE]
Then for each congruent Markov kernel with , finite we have
[TABLE]
Thus, the family on is a congruent family of covariant -tensors on finite sets, whence by Theorem 4.1
[TABLE]
for uniquely determined functions , whence on ,
[TABLE]
By our assumption, must be a covariant -tensor on , whence it must extend continuously to the boundary of .
But by (4.5) it follows that has a singularity at the boundary of , unless . From this it follows that extends to all of if and only if for all partitions where for some .
Thus, must be of the form (3.7) for all finite sets . Let
[TABLE]
for the previously determined functions , so that is a congruent family of covariant -tensors, and for every finite .
We assert that this implies that for all , which shows that is of the form (3.7) for all , which will complete the proof.
To see this, let and . Moreover, let , , such that the are step functions. That is, there is a finite partition such that
[TABLE]
for and .
Let be the statistic , and , . Then clearly, is -congruent, and with . Thus, by (3.3)
[TABLE]
whence if we let , then
[TABLE]
since by the congruence of the family we must have , and by assumption as is finite.
That is, whenever with step functions . But the elements of this form are dense in , hence the continuity of implies that for all as claimed. ∎
As two special cases of this result, we obtain the following.
Corollary 5.2** (Generalization of Chentsov’s theorem).**
- (1)
Let be a congruent family of -tensors on . Then up to a constant, this family is the Fisher metric. That is, there is a constant such that for all ,
[TABLE]
In particular, if is a -integrable statistical model, then
[TABLE]
is – up to a constant – the Fisher metric of the model. 2. (2)
Let be a congruent family of -tensors on . Then up to a constant, this family is the Amari–Chentsov tensor. That is, there is a constant such that for all ,
[TABLE]
In particular, if is a -integrable statistical model, then
[TABLE]
is – up to a constant – the Amari–Chentsov tensor of the model.
Corollary 5.3** (Generalization of Campbell’s theorem).**
Let be a congruent family of -tensors on . Then there are continuous functions such that
[TABLE]
In particular, if is a -integrable parametrized measure model, then
[TABLE]
While the above results show that for small there is a unique family of congruent -tensors, this is no longer true for larger . For instance, for Theorem 5.1 implies that any restricted congruent family of invariant -tensors on , , is of the form
[TABLE]
so that the space of congruent families on is already -dimensional in this case. Evidently, this dimension rapidly increases with .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Amari , Theory of information spaces. A geometrical foundation of statistics. POST RAAG Report 106 (1980).
- 2[2] S. Amari , Differential geometry of curved exponential families curvature and information loss. The Annals of Statistics, 10, 357-385 (1982).
- 3[3] S. Amari and H. Nagaoka , Methods of information geometry, Translations of mathematical monographs; v. 191, American Mathematical Society, Providence, RI; Oxford University Press, Oxford (2000).
- 4[4] N. Ay, J. Jost, H.V. Lê, L. Schwachhöfer , Information geometry and sufficient statistics, Probability Theory and Related Fields 162, 327–364 (2015).
- 5[5] N. Ay, J. Jost, H.V. Lê, L. Schwachhöfer , Parametrized measure models, Bernoulli (to appear), ar Xiv:1510.07305, (2015).
- 6[6] N. Ay, J. Jost, H.V. Lê, L. Schwachhöfer , Information geometry, Ergebnisse der Mathematik und ihrer Grenzgebiete, Springer (to appear).
- 7[7] M. Bauer, M. Bruveris, P. Michor , Uniqueness of the Fisher-Rao metric on the space of smooth densities, Bull.Lond.Math.Soc. 48, no. 3, 499–506 (2016).
- 8[8] M. Bauer, M. Bruveris, P. Michor , Presentation at the fourth Conference on Information Geometry ind Its Applications (IGAIA IV, 2016), Liblice, Czech Republic.
