Congruent families and invariant tensors

Lorenz Schwachh\"ofer; Nihat Ay; J\"urgen Jost; H\^ong V\^an L\^e

arXiv:1705.11014·math.ST·June 1, 2017

Congruent families and invariant tensors

Lorenz Schwachh\"ofer, Nihat Ay, J\"urgen Jost, H\^ong V\^an L\^e

PDF

TL;DR

This paper generalizes classical invariance results in information geometry, showing that invariant tensor families under congruent Markov morphisms are generated by canonical tensors for any degree n.

Contribution

It extends the characterization of invariant tensors from 2- and 3-tensors to arbitrary degree n, linking them to canonical tensor fields.

Findings

01

Invariant tensor families are algebraically generated by canonical tensors.

02

Classical invariance results are extended to higher-degree tensors.

03

The work unifies the understanding of invariant tensors in statistical models.

Abstract

Classical results of Chentsov and Campbell state that -- up to constant multiples -- the only $2$ -tensor field of a statistical model which is invariant under congruent Markov morphisms is the Fisher metric and the only invariant $3$ -tensor field is the Amari-Chentsov tensor. We generalize this result for arbitrary degree $n$ , showing that any family of $n$ -tensors which is invariant under congruent Markov morphisms is algebraically generated by the canonical tensor fields defined in an earlier paper.

Equations290

\int_{Ω} \partial_{V} lo g p (\cdot; ξ) d p (ξ) = 0

\int_{Ω} \partial_{V} lo g p (\cdot; ξ) d p (ξ) = 0

\begin{array}[]{lll}{\mathfrak{g}}^{F}(V,W)&:=&\displaystyle{\int_{\Omega}{\partial}_{V}\log p(\cdot;\xi)\;{\partial}_{W}\log p(\cdot;\xi)\;d{\mathbf{p}}(\xi)}\\ {\mathbf{T}}^{AC}(V,W,U)&:=&\displaystyle{\int_{\Omega}{\partial}_{V}\log p(\cdot;\xi)\;{\partial}_{W}\log p(\cdot;\xi)\;{\partial}_{U}\log p(\cdot;\xi)\;d{\mathbf{p}}(\xi)}.\end{array}

\begin{array}[]{lll}{\mathfrak{g}}^{F}(V,W)&:=&\displaystyle{\int_{\Omega}{\partial}_{V}\log p(\cdot;\xi)\;{\partial}_{W}\log p(\cdot;\xi)\;d{\mathbf{p}}(\xi)}\\ {\mathbf{T}}^{AC}(V,W,U)&:=&\displaystyle{\int_{\Omega}{\partial}_{V}\log p(\cdot;\xi)\;{\partial}_{W}\log p(\cdot;\xi)\;{\partial}_{U}\log p(\cdot;\xi)\;d{\mathbf{p}}(\xi)}.\end{array}

L_{n}^{Ω} (ν_{1}, \dots, ν_{n}) := n^{n} (ν_{1} \dots ν_{n}) (Ω),

L_{n}^{Ω} (ν_{1}, \dots, ν_{n}) := n^{n} (ν_{1} \dots ν_{n}) (Ω),

p^{1/ k} : M ⟶ M^{1/ k} (Ω) \subset S^{1/ k} (Ω), ξ ⟼ p (ξ)^{1/ k}

p^{1/ k} : M ⟶ M^{1/ k} (Ω) \subset S^{1/ k} (Ω), ξ ⟼ p (ξ)^{1/ k}

τ_{(M, Ω, p)}^{n} (V_{1}, \dots, V_{n}) := \int_{Ω} \partial_{V_{1}} lo g p (\cdot; ξ) \dots \partial_{V_{n}} lo g p (\cdot; ξ) d p (ξ),

τ_{(M, Ω, p)}^{n} (V_{1}, \dots, V_{n}) := \int_{Ω} \partial_{V_{1}} lo g p (\cdot; ξ) \dots \partial_{V_{n}} lo g p (\cdot; ξ) d p (ξ),

\begin{array}[]{lll}{\mathcal{P}}({\Omega})&:=&\{\mu\;:\;\mu\;\mbox{a probability measure on ${\Omega}$}\}\\[5.69054pt] {\mathcal{M}}({\Omega})&:=&\{\mu\;:\;\mu\;\mbox{a finite measure on ${\Omega}$}\}\\[5.69054pt] {\mathcal{S}}({\Omega})&:=&\{\mu\;:\;\mu\;\mbox{a signed finite measure on ${\Omega}$}\}\\[5.69054pt] {\mathcal{S}}_{a}({\Omega})&:=&\{\mu\in{\mathcal{S}}({\Omega})\;:\;\int_{\Omega}d\mu=a\}.\end{array}

\begin{array}[]{lll}{\mathcal{P}}({\Omega})&:=&\{\mu\;:\;\mu\;\mbox{a probability measure on ${\Omega}$}\}\\[5.69054pt] {\mathcal{M}}({\Omega})&:=&\{\mu\;:\;\mu\;\mbox{a finite measure on ${\Omega}$}\}\\[5.69054pt] {\mathcal{S}}({\Omega})&:=&\{\mu\;:\;\mu\;\mbox{a signed finite measure on ${\Omega}$}\}\\[5.69054pt] {\mathcal{S}}_{a}({\Omega})&:=&\{\mu\in{\mathcal{S}}({\Omega})\;:\;\int_{\Omega}d\mu=a\}.\end{array}

∥ μ ∥ := sup i = 1 \sum n ∣ μ (A_{i}) ∣

∥ μ ∥ := sup i = 1 \sum n ∣ μ (A_{i}) ∣

\|\mu\|\;=\;\mu({\Omega})\qquad\mbox{for $\mu\in{\mathcal{M}}({\Omega})$}.

\|\mu\|\;=\;\mu({\Omega})\qquad\mbox{for $\mu\in{\mathcal{M}}({\Omega})$}.

\cdot : S^{r} (Ω) \times S^{s} (Ω) ⟶ S^{r + s} (Ω) \mbox s u c h t ha t ∥ μ_{r} \cdot μ_{s} ∥_{S^{r + s} (Ω)} \leq ∥ μ_{r} ∥_{S^{r} (Ω)} ∥ μ_{s} ∥_{S^{s} (Ω)},

\cdot : S^{r} (Ω) \times S^{s} (Ω) ⟶ S^{r + s} (Ω) \mbox s u c h t ha t ∥ μ_{r} \cdot μ_{s} ∥_{S^{r + s} (Ω)} \leq ∥ μ_{r} ∥_{S^{r} (Ω)} ∥ μ_{s} ∥_{S^{s} (Ω)},

S^{r} (Ω; μ) := {ϕ μ^{r} ∣ ϕ \in L^{1/ r} (Ω, μ)} ↪ S^{r} (Ω)

S^{r} (Ω; μ) := {ϕ μ^{r} ∣ ϕ \in L^{1/ r} (Ω, μ)} ↪ S^{r} (Ω)

S_{0}^{r} (Ω; μ) := {ϕ μ^{r} ∣ ϕ \in L^{1/ r} (Ω, μ), E_{μ} (ϕ) = 0} \subset S^{r} (Ω; μ) .

S_{0}^{r} (Ω; μ) := {ϕ μ^{r} ∣ ϕ \in L^{1/ r} (Ω, μ), E_{μ} (ϕ) = 0} \subset S^{r} (Ω; μ) .

(ϕ μ^{r}) \cdot (ψ μ^{s}) = (ϕ ψ) μ^{r + s}, π^{k} (ϕ μ^{r}) := sign (ϕ) ∣ ϕ ∣^{k} μ^{r k},

(ϕ μ^{r}) \cdot (ψ μ^{s}) = (ϕ ψ) μ^{r + s}, π^{k} (ϕ μ^{r}) := sign (ϕ) ∣ ϕ ∣^{k} μ^{r k},

d_{μ_{r}} π^{k} (ν_{r}) = k ∣ μ ∣^{k - 1} \cdot ν_{r} .

d_{μ_{r}} π^{k} (ν_{r}) = k ∣ μ ∣^{k - 1} \cdot ν_{r} .

L^{n}_{\Omega}(\mu_{1},\ldots,\mu_{n}):=n^{n}\int_{\Omega}d(\mu_{1}\cdots\mu_{n})\qquad\mbox{for $\mu_{i}\in{\mathcal{S}}^{1/n}({\Omega})$},

L^{n}_{\Omega}(\mu_{1},\ldots,\mu_{n}):=n^{n}\int_{\Omega}d(\mu_{1}\cdots\mu_{n})\qquad\mbox{for $\mu_{i}\in{\mathcal{S}}^{1/n}({\Omega})$},

⟨ \cdot; \cdot ⟩ := \frac{1}{4} L_{Ω}^{2} (\cdot, \cdot)

⟨ \cdot; \cdot ⟩ := \frac{1}{4} L_{Ω}^{2} (\cdot, \cdot)

\partial_{v} lo g p (ξ) := \frac{d { d _{ξ} p ( v )}}{d p ( ξ )} \in L^{1} (Ω, p (ξ)) .

\partial_{v} lo g p (ξ) := \frac{d { d _{ξ} p ( v )}}{d p ( ξ )} \in L^{1} (Ω, p (ξ)) .

d_{ξ} p^{1/ k} (v) = \frac{1}{k} \partial_{v} lo g p (ξ) p^{1/ k} .

d_{ξ} p^{1/ k} (v) = \frac{1}{k} \partial_{v} lo g p (ξ) p^{1/ k} .

Ω ⟶ [0, 1], ω ⟼ K (ω) (A^{'}) =: K (ω; A^{'})

Ω ⟶ [0, 1], ω ⟼ K (ω) (A^{'}) =: K (ω; A^{'})

K_{*} : S (Ω) ⟶ S (Ω^{'}), K_{*} μ (A^{'}) := \int_{Ω} K (ω; A^{'}) d μ (ω)

K_{*} : S (Ω) ⟶ S (Ω^{'}), K_{*} μ (A^{'}) := \int_{Ω} K (ω; A^{'}) d μ (ω)

\|K_{\ast}\mu\|=\|\mu\|\qquad\mbox{ for all $\mu\in{\mathcal{M}}({\Omega})$,}

\|K_{\ast}\mu\|=\|\mu\|\qquad\mbox{ for all $\mu\in{\mathcal{M}}({\Omega})$,}

K_{*}^{κ} μ (A^{'}) = \int_{Ω} K^{κ} (ω; A^{'}) d μ (ω) = \int_{κ^{- 1} (A^{'})} d μ = μ (κ^{- 1} A^{'}) = κ_{*} μ (A^{'}),

K_{*}^{κ} μ (A^{'}) = \int_{Ω} K^{κ} (ω; A^{'}) d μ (ω) = \int_{κ^{- 1} (A^{'})} d μ = μ (κ^{- 1} A^{'}) = κ_{*} μ (A^{'}),

\kappa_{*}K({\omega})=\delta_{{\omega}}\qquad\mbox{for all ${\omega}\in{\Omega}$},

\kappa_{*}K({\omega})=\delta_{{\omega}}\qquad\mbox{for all ${\omega}\in{\Omega}$},

Ω = ⋃^{˙}_{i \in I} Ω_{i}, \mbox w h er e Ω_{i} = κ^{- 1} (i) .

Ω = ⋃^{˙}_{i \in I} Ω_{i}, \mbox w h er e Ω_{i} = κ^{- 1} (i) .

K(i)({\Omega}_{j})=K(i;{\Omega}_{j})=0\qquad\mbox{for all $i\neq j\in I$.}

K(i)({\Omega}_{j})=K(i;{\Omega}_{j})=0\qquad\mbox{for all $i\neq j\in I$.}

∥ \partial_{v} lo g p^{'} (ξ) ∥_{L^{k} (Ω^{'}, p^{'} (ξ))} \leq ∥ \partial_{v} lo g p (ξ) ∥_{L^{k} (Ω, p (ξ))} .

∥ \partial_{v} lo g p^{'} (ξ) ∥_{L^{k} (Ω^{'}, p^{'} (ξ))} \leq ∥ \partial_{v} lo g p (ξ) ∥_{L^{k} (Ω, p (ξ))} .

T (V^{*}) := n = 0 ⨁ \infty \otimes^{n} V^{*},

T (V^{*}) := n = 0 ⨁ \infty \otimes^{n} V^{*},

\otimes^{n}V^{\ast}=\{\tau^{n}:\underbrace{V\times\cdots\times V}_{\text{$n$ times}}\longrightarrow{\mathbb{F}}\mid\mbox{$\tau^{n}$ is $n$-multilinear}\}.

\otimes^{n}V^{\ast}=\{\tau^{n}:\underbrace{V\times\cdots\times V}_{\text{$n$ times}}\longrightarrow{\mathbb{F}}\mid\mbox{$\tau^{n}$ is $n$-multilinear}\}.

(τ_{1}^{n} \otimes τ_{2}^{m}) (v_{1}, \dots, v_{n + m}) := τ_{1}^{n} (v_{1}, \dots, v_{n}) \cdot τ_{2}^{m} (v_{n + 1}, \dots, v_{n + m}) .

(τ_{1}^{n} \otimes τ_{2}^{m}) (v_{1}, \dots, v_{n + m}) := τ_{1}^{n} (v_{1}, \dots, v_{n}) \cdot τ_{2}^{m} (v_{n + 1}, \dots, v_{n + m}) .

(P_{σ} τ^{n}) (v_{1}, \dots, v_{n}) := τ^{n} (v_{σ^{- 1} (1)}, \dots, v_{σ^{- 1} (n)})

(P_{σ} τ^{n}) (v_{1}, \dots, v_{n}) := τ^{n} (v_{σ^{- 1} (1)}, \dots, v_{σ^{- 1} (n)})

\odot^{n}V^{\ast}:=\{\tau^{n}\in\otimes^{n}V^{\ast}\mid\mbox{$\tau^{n}$ is symmetric}\}

\odot^{n}V^{\ast}:=\{\tau^{n}\in\otimes^{n}V^{\ast}\mid\mbox{$\tau^{n}$ is symmetric}\}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Congruent families and invariant tensors

Lorenz Schwachhöfer, Nihat Ay, Jürgen Jost, Hông Vân Lê

L. Schwachhöfer, TU Dortmund University, Dortmund, Germany, [email protected]

N. Ay, J. Jost, Max-Planck-Institute for Mathematics in the Sciences, Leipzig, Germany, [email protected], [email protected]

H.V. Lê, Academy of Sciences of the Czech Republic, Prague, [email protected]

Abstract.

Classical results of Chentsov and Campbell state that – up to constant multiples – the only $2$ -tensor field of a statistical model which is invariant under congruent Markov morphisms is the Fisher metric and the only invariant $3$ -tensor field is the Amari-Chentsov tensor. We generalize this result for arbitrary degree $n$ , showing that any family of $n$ -tensors which is invariant under congruent Markov morphisms is algebraically generated by the canonical tensor fields defined in [5].

Key words and phrases:

Chentsov’s theorem, sufficient statistic, congruent Markov kernel, statistical model

2010 Mathematics Subject Classification:

primary: 62B05, 62B10, 62B86, secondary: 53C99

1. Introduction

The main task of *Information geometry *is to use differential geometric methods in probability theory in order to gain insight into the structure of families of probability measures or, slightly more general, finite measures on some (finite or infinite) sample space ${\Omega}$ . In fact, one of the key themes of differential geometry is to identify quantities that do not depend on how we parametrize our objects, but that depend only on their intrinsic structure. And since in information geometry, we not only have the structure of the parameter space, the classical object of differential geometry, but also the sample space on which the probability measures live, we should also look at invariance properties with respect to the latter. That is what we shall systematically do in this contribution.

When parametrizing such a family by a manifold $M$ , there are two classically known symmetric tensor fields on the parameter space $M$ . The first is a quadratic form (i.e., a Riemannian metric), called the Fisher metric ${\mathfrak{g}}^{F}$ , and the second is a $3$ -tensor, called the Amari-Chentsov tensor ${\mathbf{T}}^{AC}$ . The Fisher metric was first suggested by Rao [19], followed by Jeffreys [15], Efron [14] and then systematically developed by Chentsov and Morozova [10], [11] and [18]; the Amari-Chentsov tensor and its significance was discovered by Amari [1], [2] and Chentsov [12]. If the family is given by a positive density function ${\mathbf{p}}(\xi)=p(\cdot;\xi)\mu$ w.r.t. some fixed background measure $\mu$ on ${\Omega}$ and $p:{\Omega}\times M\to(0,\infty)$ differentiable in the $\xi$ -direction, then the score

[TABLE]

vanishes, while the Fisher metric ${\mathfrak{g}}^{F}$ and the Amari-Chentsov tensor ${\mathbf{T}}^{AC}$ associated to a parametrized measure model are given by

[TABLE]

Of course, this naturally suggests to consider analogous tensors for arbitrary degree $n$ . The tensor fields in (1.2) have some remarkable properties. On the one hand, they may be defined independently of the particular choice of a parametrization and thus are naturally defined from the differential geometric point of view. Their most important property from the point of view of statistics is that these tensors are invariant under sufficient statistics or, more general by congruent Markov morphisms. In fact, these tensor fields are characterized by this invariance property. This was shown in the case of finite sample spaces by Chentsov in [11] and for an arbitrary sample space by the authors of the present article in [4].

The question addressed in this article is to classify all tensor fields which are invariant under sufficient statistics and congruent Markov morphisms. In order to do this, we first have to make this invariance condition precise.

Observe that both [11] and [4] require the family to be of the form ${\mathbf{p}}(\xi)=p(\cdot;\xi)\mu$ with $p>0$ , which in particular implies that all these measures are equivalent, i.e., have the same null sets. Later, in [5] and [6], the authors of this article introduced a more general notion of a *parametrized measure model *as a map ${\mathbf{p}}:M\to{\mathcal{M}}({\Omega})$ from a (finite or infinite dimensional) manifold $M$ into the space ${\mathcal{M}}({\Omega})$ of finite measures which is continuously Fréchet-differentiable when regarded as a map into the Banach lattice ${\mathcal{S}}({\Omega})\supset{\mathcal{M}}({\Omega})$ of *signed * finite measures. Such a model neither requires the existence of a measure dominating all measures ${\mathbf{p}}(\xi)$ , nor does it require all these measures to be equivalent.

Furthermore, for each $r\in(0,1]$ there is a well defined Banach lattice ${\mathcal{S}}^{r}({\Omega})$ of $r$ -th powers of finite signed measures, whose nonnegative elements are denoted by ${\mathcal{M}}^{r}({\Omega})\subset{\mathcal{S}}^{r}({\Omega})$ , and for each integer $n\in{\mathbb{N}}$ , there is a *canonical $n$ -tensor *on ${\mathcal{S}}^{1/n}({\Omega})$ given by

[TABLE]

where $\nu_{1}\cdots\nu_{n}\in{\mathcal{S}}({\Omega})$ is a signed measure. The multiplication on the right hand side of (1.3) refers to the multiplication of roots of measures, cf. [5, (2.11)], see also (2.2). A parametrized measure model ${\mathbf{p}}:M\to{\mathcal{M}}({\Omega})$ is called $k$ *-integrable *for $k\geq 1$ if the map

[TABLE]

is continuously Fréchet differentiable, cf. [5, Definition 4.4]. In this case, we define the *canonical $n$ -tensor of the model *as the pull-back $\tau^{n}_{(M,{\Omega},{\mathbf{p}})}:=({\mathbf{p}}^{1/n})^{\ast}L_{n}^{\Omega}$ for all $n\leq k$ . If the model is of the form ${\mathbf{p}}(\xi)=p(\cdot;\xi)\mu$ with a positive density function $p>0$ , then

[TABLE]

so that ${\mathfrak{g}}^{F}=\tau^{2}_{(M,{\Omega},{\mathbf{p}})}$ and ${\mathbf{T}}^{AC}=\tau^{3}_{(M,{\Omega},{\mathbf{p}})}$ by (1.2). The condition of $k$ -integrability ensures that the integral in (1.4) exists for $n\leq k$ .

A Markov kernel $K:{\Omega}\to{\mathcal{P}}({\Omega}^{\prime})$ induces a bounded linear map $K_{\ast}:{\mathcal{S}}({\Omega})\to{\mathcal{S}}({\Omega}^{\prime})$ , called the Markov morphism associated to $K$ . This Markov kernel is called congruent, if there is a statistic $\kappa:{\Omega}^{\prime}\to{\Omega}$ such that $\kappa_{\ast}K_{\ast}\mu=\mu$ for all $\mu\in{\mathcal{S}}({\Omega})$ .

We may associate to $K$ the map $K_{r}:{\mathcal{S}}^{r}({\Omega})\to{\mathcal{S}}^{r}({\Omega}^{\prime})$ by $K_{r}(\mu_{r})=(K_{\ast}(\mu_{r}^{1/r}))^{r}$ , where $\mu_{r}^{1/r}\in{\mathcal{S}}({\Omega})$ . While $K_{r}$ is not Fréchet differentiable in general, we still can define in a natural way the formal differential $dK_{r}$ and hence the pullback $K_{r}^{\ast}\Theta^{n}_{{\Omega};r}$ for any covariant $n$ -tensor on ${\mathcal{S}}^{r}({\Omega}^{\prime})$ which yields a covariant $n$ -tensor on ${\mathcal{S}}^{r}({\Omega})$ .

It is not hard to show that for the canonical tensor fields we have the identity $K_{1/n}^{\ast}L_{n}^{{\Omega}^{\prime}}=L_{n}^{\Omega}$ for any congruent Markov kernel $K:{\Omega}\to{\mathcal{P}}({\Omega}^{\prime})$ , whence we may say that the canonical $n$ -tensors $L_{n}^{\Omega}$ on ${\mathcal{S}}^{1/n}({\Omega})$ form a congruent family. Evidently, any tensor field which is given by linear combinations of tensor products of canonical tensors and permutations of the argument is also a congruent family, and the families of this type are said to be algebraically generated by $L^{n}_{\Omega}$ .

Our main result is that these exhaust the possible invariant families of covariant tensor fields:

Theorem 1.1.

Let $(\Theta^{n}_{{\Omega};r})$ be a family of covariant $n$ -tensors on ${\mathcal{S}}^{r}({\Omega})$ for each measurable space ${\Omega}$ . Then this family is invariant under congruent Markov morphisms if and only if it is algebraically generated by the canonical tensors $L_{m}^{\Omega}$ with $m\leq 1/r$ .

In particular, on each $k$ -integrable parametrized measure model $(M,{\Omega},{\mathbf{p}})$ any tensor field which is invariant under congruent Markov morphisms is algebraically generated by the canonical tensor fields $\tau_{(M,{\Omega},{\mathbf{p}})}^{m}$ , $m\leq k$ .

We shall show that this conclusion already holds if the family is invariant under congruent Markov morphisms $K:I\to{\mathcal{P}}({\Omega})$ with finite $I$ . Also, observe that this theorem yields another proof of the theorems of Chentsov [12, Theorem 11.1] and Campbell ([9] or [4]) which classify the invariant families of $2$ - and $3$ -tensors, respectively. Campbell’s theorem covers the case where the measures no longer need to be probability measure. In such a situation, the analogue of the score (1.1) no longer needs to vanish, and it furnishes a nontrivial $1$ -tensor.

Let us comment on the relation of our results to those of Bauer et al. [7] [8]. Assuming that the sample space ${\Omega}$ is a manifold (with boundary or even with corners), the space $\text{Dens}_{+}({\Omega})$ of (smooth) densities on ${\Omega}$ is defined as the set of all measures of the form $\mu=f{\rm vol}_{g}$ , where $f>0$ is a smooth function with finite integral, ${\rm vol}_{g}$ being the volume form of some Riemannian metric $g$ on $M$ . Thus, $\text{Dens}_{+}({\Omega})$ is a Fréchet manifold, and regarding a diffeomorphism $K:{\Omega}\to{\Omega}$ as a congruent statistic, the induced maps $K_{r}:\text{Dens}_{+}({\Omega})^{r}\to\text{Dens}_{+}({\Omega})^{r}$ are diffeomorphisms of Fréchet manifolds. The main result in [7] states that for $\dim{\Omega}\geq 2$ any $2$ -tensor field which is invariant under diffeomorphisms is a multiple of the Fisher metric. Likewise, the space of diffeomorphism invariant $n$ -tensors for arbitrary $n$ [8] is generated by the canonical tensors. Thus, when restricting to parametrized measure models ${\mathbf{p}}:M\to\text{Dens}_{+}({\Omega})\subset{\mathcal{M}}({\Omega})$ whose image lies in the space of densities and which are differentiable w.r.t. the Fréchet manifold structure on $\text{Dens}_{+}({\Omega})$ , then the invariance of a tensor field under diffeomorphisms rather than under arbitrary congruent Markov morphisms already implies that the tensor field is algebraically generated by the canonical tensors. Considering invariance under diffeomorphisms is natural in the sense that they can be regarded as the natural analogues of permutations of a finite sample space. In our more general setting, however, the concept of a diffeomorphism is no longer meaningful, and we need to consider invariance under a larger class of transformations, the congruent Markov morphisms.

In a similar spirit, J. Dowty [13] has shown recently that when restricting to the space of exponential families, the Fisher metric is the only $2$ -tensor which is invariant under independent and identically distributed extensions and canonical sufficient statistics.

This paper is structured as follows. In Section 2 we recall from [5] the definition of a parametrized measure model, roots of measures and congruent Markov kernels, and furthermore we give an explicit description of the space of covariant families which are algebraically generated by the canonical tensors. In Section 3 we recall the notion of congruent families of tensor fields and show that the canonical tensors and hence tensors which are algebraically generated by these are congruent. Then we show that these exhaust all invariant families of tensor field on *finite *sample spaces ${\Omega}$ in Section 4, and finally, in Section 5, by reducing the general case to the finite case through step function approximations, we obtain the classification result Theorem 5.1 which implies Theorem 1.1 as a simplified version.

Acknowledgements. This work was mainly carried out at the Max Planck Institute for Mathematics in the Sciences in Leipzig, and we are grateful for the excellent working conditions provided at that institution. H.V. Lê is partially supported by Grant RVO:67985840.

2. Preliminary results

2.1. The space of (signed) finite measures and their powers

Let $({\Omega},\Sigma)$ be a measurable space, that is an arbitrary set ${\Omega}$ together with a sigma algebra $\Sigma$ of subsets of ${\Omega}$ . Regarding the sigma algebra $\Sigma$ on ${\Omega}$ as fixed, we let

[TABLE]

Clearly, ${\mathcal{P}}({\Omega})\subset{\mathcal{M}}({\Omega})\subset{\mathcal{S}}({\Omega})$ , and ${\mathcal{S}}_{0}({\Omega}),{\mathcal{S}}({\Omega})$ are real vector spaces, whereas ${\mathcal{S}}_{a}({\Omega})$ is an affine space with linear part ${\mathcal{S}}_{0}({\Omega})$ . In fact, both ${\mathcal{S}}_{0}({\Omega})$ and ${\mathcal{S}}({\Omega})$ are Banach spaces whose norm is given by the total variation of a signed measure, defined as

[TABLE]

where the supremum is taken over all finite partitions $\Omega=A_{1}\dot{\cup}\dots\dot{\cup}A_{n}$ with disjoint sets $A_{i}\in\Sigma$ . Here, the symbol $\dot{\cup}$ stands for the disjoint union of sets. In particular,

[TABLE]

In [5], for each $r\in(0,1]$ the space ${\mathcal{S}}^{r}({\Omega})$ of $r$ -th powers of measures on ${\Omega}$ is defined. We shall not repeat the formal definition here, but we recall the most important features of these spaces.

Each ${\mathcal{S}}^{r}({\Omega})$ is a Banach lattice whose norm we denote by $\|\cdot\|_{{\mathcal{S}}^{r}({\Omega})}$ , and ${\mathcal{M}}^{r}({\Omega})\subset{\mathcal{S}}^{r}({\Omega})$ denotes the spaces of nonnegative elements. Moreover, ${\mathcal{S}}^{1}({\Omega})={\mathcal{S}}({\Omega})$ in a canonical way. For $r,s,r+s\in(0,1]$ there is a bilinear product

[TABLE]

and for $0<k<1/r$ there is a exponentiating map $\pi^{k}:{\mathcal{S}}^{r}({\Omega})\to{\mathcal{S}}^{kr}{\Omega})$ which is continuous for $k<1$ and a Fréchet- $C^{1}$ -map for $k\geq 1$ .

In order to understand these objects more concretely, let $\mu\in{\mathcal{M}}({\Omega})$ be a measure, so that $\mu^{r}:=\pi^{r}(\mu)\in{\mathcal{S}}^{r}({\Omega})$ . Then for all $\phi\in L^{1/r}({\Omega},\mu)$ we have $\phi\mu^{r}\in{\mathcal{S}}^{r}({\Omega})$ , and $\phi\mu^{r}\in{\mathcal{M}}^{r}({\Omega})$ if and only if $\phi\geq 0$ . The inclusion

[TABLE]

is an isometric inclusion of Banach spaces, and the elements of ${\mathcal{S}}^{r}({\Omega},\mu)$ are said to be dominated by $\mu$ . We also define

[TABLE]

Moreover,

[TABLE]

where $\phi\in L^{1/r}({\Omega},\mu)$ and $\psi\in L^{1/s}({\Omega},\mu)$ . The Fréchet derivative of $\pi^{k}$ at $\mu_{r}\in{\mathcal{S}}^{r}({\Omega})$ is given by

[TABLE]

Furthermore, for an integer $n\in{\mathbb{N}}$ , we have the canonical $n$ -tensor on ${\mathcal{S}}^{1/n}({\Omega})$ , given by

[TABLE]

which is a symmetric $n$ -multilinear form, where we regard the product $\mu_{1}\cdots\mu_{n}$ as an element of ${\mathcal{S}}^{1}({\Omega})={\mathcal{S}}({\Omega})$ . For instance, for $n=2$ the bilinear form

[TABLE]

equips ${\mathcal{S}}^{1/2}({\Omega})$ with a Hilbert space structure with induced norm $\|\cdot\|_{{\mathcal{S}}^{1/2}({\Omega})}$ .

2.2. Parametrized measure models

Recall from [5] that a parametrized measure model is a triple $(M,{\Omega},{\mathbf{p}})$ consisting of a (finite or infinite dimensional) manifold $M$ and a map ${\mathbf{p}}:M\to{\mathcal{M}}({\Omega})$ which is Fréchet-differentiable when regarded as a map into ${\mathcal{S}}({\Omega})$ (cf. [5, Definition 4.1]). If ${\mathbf{p}}(\xi)\in{\mathcal{P}}({\Omega})$ for all $\xi\in M$ , then $(M,{\Omega},{\mathbf{p}})$ is called a statistical model. Moreover, $(M,{\Omega},{\mathbf{p}})$ is called $k$ -integrable, if ${\mathbf{p}}^{1/k}:M\to{\mathcal{S}}^{1/k}({\Omega})$ is also Fréchet integrable (cf. [16, Definition 2.6]). For a parametrized measure model, the differential $d_{\xi}{\mathbf{p}}(v)\in{\mathcal{S}}({\Omega})$ with $v\in T_{\xi}M$ is always dominated by ${\mathbf{p}}(\xi)\in{\mathcal{M}}({\Omega})$ , and we define the logarithmic derivative (cf. [5, Definition 4.3]) as the Radon-Nikodym derivative

[TABLE]

Then ${\mathbf{p}}$ is $k$ -integrable if and only if ${\partial}_{v}\log{\mathbf{p}}\in L^{k}({\Omega},{\mathbf{p}}(\xi)$ for all $v\in T_{\xi}M$ , and the function $v\mapsto\|{\partial}_{v}\log{\mathbf{p}}\|_{{\partial}_{v}\log{\mathbf{p}}(\xi)}$ on $TM$ is continuous (cf. [16, Theorem 2.7]). In this case, the Fréchet derivative of ${\mathbf{p}}^{1/k}$ is given as

[TABLE]

2.3. Congruent Markov morphisms

Definition 2.1.

A *Markov kernel * between two measurable spaces $(\Omega,\mathfrak{B})$ and $(\Omega^{\prime},\mathfrak{B}^{\prime})$ is a map $K:{\Omega}\to{\mathcal{P}}({\Omega}^{\prime})$ associating to each ${\omega}\in{\Omega}$ a probability measure on ${\Omega}^{\prime}$ such that for each fixed measurable $A^{\prime}\subset{\Omega}^{\prime}$ the map

[TABLE]

is measurable for all $A^{\prime}\in\mathfrak{B}^{\prime}$ . The linear map

[TABLE]

is called the Markov morphism induced by $K$ .

Evidently, a Markov morphism maps ${\mathcal{M}}({\Omega})$ to ${\mathcal{M}}({\Omega}^{\prime})$ , and

[TABLE]

so that $K_{\ast}$ also maps ${\mathcal{P}}({\Omega})$ to ${\mathcal{P}}({\Omega}^{\prime})$ . For any $\mu\in{\mathcal{S}}({\Omega})$ , $\|K_{\ast}\mu\|\leq\|\mu\|$ , whence $K_{\ast}$ is bounded.

Example 2.1.

A measurable map $\kappa:{\Omega}\to{\Omega}^{\prime}$ , called a statistic, induces a Markov kernel by setting $K^{\kappa}({\omega}):=\delta_{\kappa{\omega}}\in{\mathcal{P}}({\Omega}^{\prime})$ . In this case,

[TABLE]

whence $K^{\kappa}_{\ast}\mu=\kappa_{\ast}\mu$ is the push-forward of (signed) measures on ${\Omega}$ to (signed) measures on ${\Omega}^{\prime}$ .

Definition 2.2.

A Markov kernel $K:{\Omega}\to{\mathcal{P}}({\Omega}^{\prime})$ is called congruent w.r.t. to the statistic $\kappa:{\Omega}^{\prime}\to{\Omega}$ if

[TABLE]

or, equivalently, if $K_{*}$ is a right inverse of $\kappa_{*}$ , i.e., $\kappa_{*}K_{*}={\rm Id}_{{\mathcal{S}}({\Omega})}$ . It is called *congruent *if it is congruent w.r.t. some statistic $\kappa:{\Omega}^{\prime}\to{\Omega}$ .

This notion was introduced by Chentsov in the case of finite sample spaces [12], but the natural generalization in Definition 2.2 to arbitrary sample spaces has been treated in [4], [5] and [17].

Example 2.2.

A statistic $\kappa:{\Omega}\to I$ between finite sets induces a partition

[TABLE]

In this case, a Markov kernel $K:I\to{\mathcal{P}}({\Omega})$ is $\kappa$ -congruent of and only of

[TABLE]

If $(M,{\Omega},{\mathbf{p}})$ is a parametrized measure model and $K:{\Omega}\to{\mathcal{P}}({\Omega}^{\prime})$ a Markov kernel, then $(M,{\mathbf{p}}^{\prime},{\Omega}^{\prime})$ with ${\mathbf{p}}^{\prime}:=K_{\ast}{\mathbf{p}}:M\to{\mathcal{M}}({\Omega}^{\prime})\subset{\mathcal{S}}({\Omega}^{\prime})$ is again a parametrized measure model. In this case, we have the following result.

Proposition 2.1.

([5, Theorem 3.3]) Let $K_{\ast}:{\mathcal{S}}({\Omega})\to{\mathcal{S}}({\Omega}^{\prime})$ be a Markov morphism induced by the Markov kernel $K:{\Omega}\to{\mathcal{P}}({\Omega}^{\prime})$ , let ${\mathbf{p}}:M\to{\mathcal{M}}({\Omega})$ be a $k$ -integrable parametrized measure model and ${\mathbf{p}}^{\prime}:=K_{\ast}{\mathbf{p}}:M\to{\mathcal{M}}({\Omega}^{\prime})$ . Then ${\mathbf{p}}^{\prime}$ is also $k$ -integrable, and

[TABLE]

2.4. Tensor algebras

In this section we shall provide the algebraic background on tensor algebras. Let $V$ be a vector space over a commutative field ${\mathbb{F}}$ , and let $V^{\ast}$ be its dual. The tensor algebra of $V^{\ast}$ is defined as

[TABLE]

where

[TABLE]

In particular, $\otimes^{0}V^{\ast}:={\mathbb{F}}$ and $\otimes^{1}V^{\ast}:=V^{\ast}$ . ${\mathbf{T}}(V^{\ast})$ . Then ${\mathbf{T}}(V^{\ast})$ is a graded associative unital algebra, where the product $\otimes:\otimes^{n}V^{\ast}\times\otimes^{m}V^{\ast}\to\otimes^{n+m}V^{\ast}$ is defined as

[TABLE]

By convention, the multiplication with elements of $\otimes^{0}V^{\ast}={\mathbb{F}}$ is the scalar multiplication, so that $1\in{\mathbb{F}}$ is the unit of ${\mathbf{T}}(V^{\ast})$ . Observe that ${\mathbf{T}}(V^{\ast})$ is non-commutative.

There is a linear action of $S_{n}$ , the permutation group of $n$ elements, on $\otimes^{n}V^{\ast}$ given by

[TABLE]

for $\sigma\in S_{n}$ and $\tau^{n}\in\otimes^{n}V^{\ast}$ . Indeed, the identity $P_{\sigma_{1}}(P_{\sigma_{2}}\tau^{n})=P_{\sigma_{1}\sigma_{2}}\tau^{n}$ is easily verified. We call a tensor $\tau^{n}\in\otimes^{n}V^{\ast}$ symmetric, if $P_{\sigma}\tau^{n}=\tau^{n}$ for all $\sigma\in S_{n}$ , and we let

[TABLE]

the $n$ -fold symmetric power of $V^{\ast}$ . Evidently, $\odot^{n}V^{\ast}\subset\otimes^{n}V^{\ast}$ is a linear subspace.

A unital subalgebra of ${\mathbf{T}}(V^{\ast})$ is a linear subspace ${\mathcal{A}}\subset{\mathbf{T}}(V^{\ast})$ containing ${\mathbb{F}}=\odot^{0}V^{\ast}$ which is closed under tensor products, i.e. such that $\tau_{1},\tau_{2}\in{\mathcal{A}}$ implies that $\tau_{1}\otimes\tau_{2}\in{\mathcal{A}}$ . We call such a subalgebra graded if

[TABLE]

and a graded subalgebra ${\mathcal{A}}\subset{\mathbf{T}}(V)$ is called *permutation invariant *if ${\mathcal{A}}_{n}$ is preserved by the action of $S_{n}$ on ${\mathcal{A}}_{n}\subset\otimes^{n}V^{\ast}$ .

Definition 2.3.

Let ${\mathcal{S}}\subset{\mathbf{T}}(V^{\ast})$ be an arbitrary subset. The intersection of all permutation invariant unital subalgebras of ${\mathbf{T}}(V^{\ast})$ containing ${\mathcal{S}}$ is called the permutation invariant subalgebra generated by ${\mathcal{S}}$ and is denoted by ${\mathcal{A}}_{\text{perm}}({\mathcal{S}})$ .

Observe that ${\mathcal{A}}_{\text{perm}}({\mathcal{S}})$ is the smallest permutation invariant unital subalgebra of ${\mathbf{T}}(V^{\ast})$ which contains ${\mathcal{S}}$ .

Example 2.3.

Evidently, ${\mathcal{A}}_{\text{perm}}(\emptyset)={\mathbb{F}}$ .

To see another example, let $\tau^{1}\in V^{\ast}$ . If we let ${\mathcal{A}}_{0}:={\mathbb{F}}$ and ${\mathcal{A}}_{n}:={\mathbb{F}}(\underbrace{\tau^{1}\otimes\cdots\otimes\tau^{1}}_{\text{$ n $times}})$ for $n\geq 1$ , then ${\mathcal{A}}_{\text{perm}}(\tau^{1})=\bigoplus_{n=0}^{\infty}{\mathcal{A}}_{n}$ . In fact, ${\mathcal{A}}_{\text{perm}}(\tau^{1})$ is even commutative and isomorphic to the algebra of polynomials over ${\mathbb{F}}$ in one variable.

For $n\in{\mathbb{N}}$ , we denote by $\mbox{\bf Part}(n)$ the collection of partitions ${\bf P}=\{P_{1},\ldots,P_{r}\}$ of $\{1,\ldots,n\}$ , that is, $\bigcup_{k}P_{k}=\{1,\ldots,n\}$ , and these sets are pairwise disjoint. We denote the number $r$ of sets in the partition by $|{\bf P}|$ .

Given a partition ${\bf P}=\{P_{1},\ldots,P_{r}\}\in\mbox{\bf Part}(n)$ , we associate to it a bijective map

[TABLE]

where $n_{i}:=|P_{i}|$ , such that $\pi_{\bf P}(\{i\}\times\{1,\dots,n_{i}\})=P_{i}$ . This map is well defined, up to permutation of the elements in $P_{i}$ .

$\mbox{\bf Part}(n)$ is partially ordered by the relation ${\bf P}\leq{\bf P}^{\prime}$ if ${\bf P}$ is a subdivision of ${\bf P}^{\prime}$ . This ordering has the partition $\{\{1\},\ldots,\{n\}\}$ into singleton sets as its minimum and $\{\{1,\ldots,n\}\}$ as its maximum.

Consider now a subset of ${\mathbf{T}}(V^{\ast})$ of the form

[TABLE]

For a partition ${\bf P}\in\mbox{\bf Part}(n)$ with the associated map $\pi_{\bf P}$ from (2.13) we define $\tau^{\bf P}\in\otimes^{n}V^{\ast}$ as

[TABLE]

Observe that this definition is independent of the choice of the bijection $\pi_{\bf P}$ , since $\tau^{n_{i}}$ is symmetric.

Example 2.4.

(1)

If ${\bf P}=\{\{1,\ldots,n\}\}$ is the trivial partition, then

[TABLE] 2. (2)

If ${\bf P}=\{\{1\},\ldots,\{n\}\}$ is the partition into singletons, then

[TABLE] 3. (3)

To give a concrete example, let $n=5$ and ${\bf P}=\{\{1,3\},\{2,5\},\{4\}\}$ . Then

[TABLE]

We can now present the main result of this section.

Proposition 2.2.

Let ${\mathcal{S}}\subset{\mathbf{T}}(V^{\ast})$ be given as in (2.14). Then the permutation invariant subalgebra generated by ${\mathcal{S}}$ equals

[TABLE]

Proof.

Let us denote the right hand side of (2.16) by ${\mathcal{A}}_{\text{perm}}^{\prime}({\mathcal{S}})$ , so that we wish to show that ${\mathcal{A}}_{\text{perm}}({\mathcal{S}})={\mathcal{A}}_{\text{perm}}^{\prime}({\mathcal{S}})$ .

By Example 2.4.1, $\tau^{n}\in{\mathcal{A}}_{\text{perm}}^{\prime}({\mathcal{S}})$ for all $n\in{\mathbb{N}}$ , whence ${\mathcal{S}}\subset{\mathcal{A}}_{\text{perm}}^{\prime}({\mathcal{S}})$ . Furthermore, by (2.15) we have

[TABLE]

where ${\bf P}\cup{\bf P}^{\prime}\in{\bf Part}(n+m)$ is the partition of $\{1,\ldots,n+m\}$ obtained by regarding ${\bf P}\in{\bf Part}(n)$ and ${\bf P}^{\prime}\in{\bf Part}(m)$ as partitions of $\{1,\ldots,n\}$ and $\{n+1,\ldots,n+m\}$ , respectively. Moreover, if $\sigma\in S_{n}$ is a permutation and ${\bf P}=\{P_{1},\ldots,P_{r}\}$ a partition, then the definition in (2.15) implies that

[TABLE]

That is, ${\mathcal{A}}_{\text{perm}}^{\prime}({\mathcal{S}})\subset{\mathbf{T}}(V^{\ast})$ is a permutation invariant unital subalgebra of ${\mathbf{T}}(V^{\ast})$ containg ${\mathcal{S}}$ , whence ${\mathcal{A}}_{\text{perm}}({\mathcal{S}})\subset{\mathcal{A}}_{\text{perm}}^{\prime}({\mathcal{S}})$ .

For the converse, observe that for a partition ${\bf P}=\{P_{1},\ldots,P_{r}\}\in{\bf Part}(n)$ , we may – after applying a permutation of $\{1,\ldots,n\}$ – assume that

[TABLE]

with $k_{i}=|P_{i}|$ , and in this case, (2.11) and (2.15) implies that

[TABLE]

so that any permutation invariant subalgebra containing ${\mathcal{S}}$ also must contain $\tau^{\bf P}$ for all partitions, and this shows that ${\mathcal{A}}_{\text{perm}}^{\prime}({\mathcal{S}})\subset{\mathcal{A}}_{\text{perm}}({\mathcal{S}})$ . ∎

2.5. Tensor fields

Recall that a (covariant) $n$ -tensor field111Since we do not consider non-covariant $n$ -tensor fields in this paper, we shall suppress the attribute covariant. $\Psi$ on a manifold $M$ is a collection of $n$ -multilinear forms $\Psi_{p}$ on $T_{p}M$ for all $p\in M$ such that for continuous vector fields $X^{1},\ldots,X^{n}$ on $M$ the function

[TABLE]

is continuous. This notion can also be adapted to the case where $M$ has a weaker structre than that of a manifold. The examples we have in mind are the subsets ${\mathcal{P}}^{r}({\Omega})\subset{\mathcal{M}}^{r}({\Omega})$ of ${\mathcal{S}}^{r}({\Omega})$ for an arbitrary measurable space ${\Omega}$ and $r\in(0,1]$ , which fail to be manifolds. Nevertheless, there is a natural notion of tangent cone at $\mu_{r}$ of these sets which is the collection of the derivatives of all curves in ${\mathcal{M}}^{r}({\Omega})$ (in ${\mathcal{P}}^{r}({\Omega})$ , respectively) through $\mu_{r}$ . These cones were determined in [5, Proposition 2.1] as

[TABLE]

with $\mu\in{\mathcal{M}}({\Omega})$ ( $\mu\in{\mathcal{P}}({\Omega})$ , respectively). Then in analogy to the notion for general manifolds, we can now define the notion of $n$ -tensor field on ${\mathcal{M}}^{r}({\Omega})$ and ${\mathcal{P}}^{r}({\Omega})$ as follows.

Definition 2.4.

Let ${\Omega}$ be a measurable space and $r\in(0,1]$ . A *vector field on ${\mathcal{M}}^{r}({\Omega})$ *is a continuous map $X:{\mathcal{M}}^{r}({\Omega})\to{\mathcal{S}}^{r}({\Omega})$ such that $X_{\mu^{r}}\in T_{\mu^{r}}{\mathcal{M}}^{r}({\Omega})$ for all $\mu^{r}\in{\mathcal{M}}^{r}({\Omega})$ . The notion of a vector field on ${\mathcal{P}}^{r}({\Omega})$ is defined analogously.

A *(covariant) $n$ -tensor field on ${\mathcal{M}}^{r}({\Omega})$ *is a collection of $n$ -multilinear forms $\Psi_{\mu^{r}}$ on $T_{\mu^{r}}{\mathcal{M}}^{r}({\Omega})$ for all $\mu^{r}\in{\mathcal{M}}^{r}({\Omega})$ such that for continuous vector fields $X^{1},\ldots,X^{n}$ on ${\mathcal{M}}^{r}({\Omega})$ the function

[TABLE]

is continuous. The notion of vector fields and $n$ -tensor fields on ${\mathcal{P}}^{r}({\Omega})$ is defined analogously.

If $\Psi,\Psi^{\prime}$ are tensor fields of degree $n$ and $m$ , respectively, and $\sigma\in S_{n}$ is a permutation, then the pointwise tensor product $\Psi\otimes\Psi^{\prime}$ and the permutation $P_{\sigma}\Psi$ defined in (2.11) and (2.12) are tensor fields of degree $n+m$ and $n$ , respectively. Moreover, for a differentiable map $f:N\to M$ the *pull-back of $\Psi$ under $f$ *is the tensor field on $N$ defined by

[TABLE]

Evidently, we have

[TABLE]

For instance, if $(M,{\Omega},{\mathbf{p}})$ is a $k$ -integrable parametrized measure model, then by (2.7), $d_{\xi}{\mathbf{p}}^{1/k}(v)\in{\mathcal{S}}^{1/k}({\Omega};\mu)=T_{{\mathbf{p}}^{1/k}(\xi)}{\mathcal{M}}^{1/k}({\Omega})$ , so that for any $n$ -tensor field $\Psi$ on ${\mathcal{M}}^{1/k}({\Omega})$ the pull-back

[TABLE]

is well defined. The same holds if ${\mathbf{p}}:M\to{\mathcal{P}}({\Omega})$ is a statistical model and $\Psi$ is an $n$ -tensor field on ${\mathcal{P}}^{1/k}({\Omega})$ . Moreover, (2.18) holds in this context as well when replacing $f$ by ${\mathbf{p}}^{1/k}$ .

Definition 2.5.

Let ${\Omega}$ be a measurable space, $n\in{\mathbb{N}}$ an integer and $0<r\leq 1/n$ . Then canonical $n$ -tensor field on ${\mathcal{S}}^{r}({\Omega})$ is defined as the pull-back

[TABLE]

with the symmetric $n$ -tensor $L^{n}_{\Omega}$ on ${\mathcal{S}}^{1/n}({\Omega})$ defined in (2.5). The definition of the pullback in (2.17) and the formula for the Fréchet-derivative of $\pi^{1/nr}$ in (2.4) now imply by a straightforward calculation that

[TABLE]

where $\mu_{r}\in{\mathcal{S}}^{r}({\Omega})$ and $\nu_{i}\in{\mathcal{S}}^{r}({\Omega})=T_{\mu_{r}}{\mathcal{S}}^{r}({\Omega})$ .

Furthermore, if $(M,{\Omega},{\mathbf{p}})$ is a $k$ -integrable parametrized measure model, $k:=1/r\geq n$ , then we define the canonical $n$ -tensor field of $(M,{\Omega},{\mathbf{p}})$ as the pull-back

[TABLE]

In this case, (2.7) implies that for $v_{1},\ldots,v_{n}\in T_{\xi}M$

[TABLE]

Example 2.5.

(1)

The canonical $1$ -tensor of $(M,{\Omega},{\mathbf{p}})$ is given as

[TABLE]

Thus, on a statistical model (i.e., if ${\mathbf{p}}(\xi)\in{\mathcal{P}}({\Omega})$ for all $\xi$ ) $\tau^{1}_{(M,{\Omega},{\mathbf{p}})}\equiv 0$ . 2. (2)

The canonical $2$ -tensor $\tau^{2}_{(M,{\Omega},{\mathbf{p}})}$ is called the *Fisher metric *of the model and is often simply denoted by ${\mathfrak{g}}$ . It is defined only if the model is $2$ -integrable. 3. (3)

The canonical $3$ -tensor $\tau^{3}_{(M,{\Omega},{\mathbf{p}})}$ is called the *Amari-Chentsov tensor *of the model. It is often simply denoted by ${\mathbf{T}}$ and is defined only if the model is $3$ -integrable.

3. Congruent families of tensor fields

The question we wish to address in this section is to characterize families of $n$ -tensor fields on ${\mathcal{M}}^{r}({\Omega})$ (on ${\mathcal{P}}^{r}({\Omega})$ , respectively) for measurable spaces ${\Omega}$ which are unchanged under congruent Markov morphisms.

First of all, we need to clarify what is meant by this. The problem we have is that a given Markov kernel $K:{\Omega}\to{\mathcal{P}}({\Omega})$ induces the bounded linear Markov morphism $K_{\ast}:{\mathcal{S}}({\Omega})\to{\mathcal{S}}({\Omega}^{\prime})$ which maps ${\mathcal{P}}({\Omega})$ and ${\mathcal{M}}({\Omega})$ to ${\mathcal{P}}({\Omega}^{\prime})$ and ${\mathcal{M}}({\Omega}^{\prime})$ , respectively, there is no induced differentiable map from ${\mathcal{P}}^{r}({\Omega})$ and ${\mathcal{M}}^{r}({\Omega})$ to ${\mathcal{P}}^{r}({\Omega}^{\prime})$ and ${\mathcal{M}}^{r}({\Omega}^{\prime})$ , respectively, if $r<1$ . The best we can do is to make the following definition.

Definition 3.1.

Let $K:{\Omega}\to{\mathcal{P}}({\Omega}^{\prime})$ be a Markov kernel with the associated Markov morphism $K_{\ast}:{\mathcal{S}}({\Omega})\to{\mathcal{S}}({\Omega}^{\prime})$ from (2.8). For $r\in(0,1]$ we define

[TABLE]

which maps ${\mathcal{P}}^{r}({\Omega})$ and ${\mathcal{M}}^{r}({\Omega})$ to ${\mathcal{P}}^{r}({\Omega}^{\prime})$ and ${\mathcal{M}}^{r}({\Omega}^{\prime})$ , respectively.

Since $r\leq 1$ , it follows that $\pi^{1/r}$ is a Fréchet- $C^{1}$ -map and $K_{\ast}$ is linear. However, $\pi^{r}$ is continuous but not differentiable for $r<1$ , whence the same holds for $K_{r}$ .

Nevertheless, let us for the moment pretend that $K_{r}$ was differentiable. Then, when rewriting (3.1) as $\pi^{1/r}K_{r}=K_{\ast}\pi^{1/r}$ , the chain rule and (2.7) would imply that

[TABLE]

for all $\mu_{r},\nu_{r}\in{\mathcal{S}}^{r}({\Omega})$ .

On the other hand, as $K_{r}$ maps ${\mathcal{M}}^{r}({\Omega})$ to ${\mathcal{M}}^{r}({\Omega}^{\prime})$ , its differential at $\mu^{r}\in{\mathcal{M}}^{r}({\Omega})$ for $\mu\in{\mathcal{M}}({\Omega})$ would restrict to a linear map

[TABLE]

where $\mu^{\prime}:=K_{\ast}\mu\in{\mathcal{M}}({\Omega}^{\prime})$ . This together with (3.2) implies that the restriction of $d_{\mu_{r}}K_{r}$ to ${\mathcal{S}}^{r}({\Omega},\mu)$ must be given as

[TABLE]

Indeed, by [5, Theorem 3.3], (3.3) defines a bounded linear map $d_{\mu^{r}}K_{r}$ . In fact, it is shown in that reference that

[TABLE]

Definition 3.2.

For $\mu\in{\mathcal{M}}({\Omega})$ , the bounded linear map (3.3) is called the formal derivative of $K_{r}$ at $\mu$ .

If $(M,{\Omega},{\mathbf{p}})$ is a $k$ -integrable parametrized measure model, then so is $(M,{\Omega}^{\prime},{\mathbf{p}}^{\prime})$ with ${\mathbf{p}}^{\prime}:=K_{\ast}{\mathbf{p}}$ by Proposition 2.1. In this case, we may also write

[TABLE]

Proposition 3.1.

The formal derivative of $K_{r}$ defined in (3.3) satisfies the identity

[TABLE]

for all $\xi\in M$ which may be regarded as the chain rule applied to the derivative of (3.4).

Proof.

For $v\in T_{\xi}M$ , $\xi\in M$ we calculate

[TABLE]

which shows the assertion. ∎

Our definition of formal derivatives is just strong enough to define the pullback of tensor fields on the space of probability measures in analogy to (2.17).

Definition 3.3 (Pullback of tensors by a Markov morphism).

Let $K:{\Omega}\to{\mathcal{P}}({\Omega}^{\prime})$ be a Markov kernel, and let $\Psi^{n}$ be an $n$ -tensor field on ${\mathcal{M}}^{r}({\Omega}^{\prime})$ (on ${\mathcal{P}}^{r}({\Omega}^{\prime})$ , respectively), cf. Definition 2.4. Then the pull-back tensor under $K$ is defined as the covariant $n$ -tensor $K_{r}^{\ast}\Psi^{n}$ on ${\mathcal{M}}^{r}({\Omega})$ (on ${\mathcal{P}}^{r}({\Omega})$ , respectively) given as

[TABLE]

with the formal derivative $dK_{r}$ from (3.3).

Evidently, $K_{r}^{*}\Psi^{n}$ is again a covariant $n$ -tensor on ${\mathcal{P}}^{r}({\Omega})$ and ${\mathcal{M}}^{r}({\Omega})$ , respectively, since $dK_{r}$ is continuous. Moreover, Proposition 3.1 implies that for a parametrized measure model $(M,{\Omega},{\mathbf{p}})$ and the induced model $(M,{\Omega}^{\prime},{\mathbf{p}}^{\prime})$ with ${\mathbf{p}}^{\prime}=K_{*}{\mathbf{p}}$ we have the identity

[TABLE]

for any covariant $n$ -tensor field $\Psi^{n}$ on ${\mathcal{P}}^{r}({\Omega})$ or ${\mathcal{M}}^{r}({\Omega})$ , respectively.

With this, we can now give a definition of congruent families of tensor fields.

Definition 3.4 (Congruent families of tensors).

Let $r\in(0,1]$ , and let $(\Theta_{{\Omega};r}^{n})$ be a collection of covariant $n$ -tensors on ${\mathcal{P}}^{r}({\Omega})$ (on ${\mathcal{M}}^{r}({\Omega})$ , respectively) for each measurable space ${\Omega}$ .

This collection is said to be a *congruent family of $n$ -tensors of regularity $r$ *if for any congruent Markov kernel $K:{\Omega}\to{\Omega}^{\prime}$ we have

[TABLE]

The following gives an important example of such families.

Proposition 3.2.

The restriction of the canonical $n$ -tensors $L^{n}_{\Omega}$ (2.5) to ${\mathcal{P}}^{1/n}({\Omega})$ and ${\mathcal{M}}^{1/n}({\Omega})$ , respectively, yield a congruent family of $n$ -tensors. Likewise, then canonical $n$ -tensors $(\tau^{n}_{{\Omega};r})$ on ${\mathcal{P}}^{r}({\Omega})$ and ${\mathcal{M}}^{r}({\Omega})$ , respectively, with $r\leq 1/n$ yield congruent families of $n$ -tensors.

Proof.

Let $K:{\Omega}\to{\mathcal{P}}({\Omega}^{\prime})$ be a Markov kernel which is congruent w.r.t. the statistic $\kappa:{\Omega}^{\prime}\to{\Omega}$ (cf. Definition 2.2). For $\mu\in{\mathcal{M}}({\Omega})$ let $\mu^{\prime}:=K_{\ast}\mu\in{\mathcal{M}}({\Omega}^{\prime})$ , so that $\kappa_{\ast}\mu^{\prime}=\kappa_{\ast}K_{\ast}\mu=\mu$ . Let $\nu_{1/n}^{i}=\phi_{i}{\mu}^{1/n}\in T_{\mu^{r}}{\mathcal{M}}^{1/n}({\Omega})={\mathcal{S}}^{1/n}({\Omega},\mu^{\prime})$ , with $\phi_{i}\in L^{1/n}({\Omega},\mu)$ , $i=1,\ldots,n$ , and define $\phi_{i}^{\prime}\in L^{1/n}({\Omega}^{\prime},\mu^{\prime})$ by

[TABLE]

By the $\kappa$ -congruency of $K$ , this implies that

[TABLE]

where $\kappa^{\ast}\phi(\cdot):=\phi(\kappa(\cdot))$ , so that

[TABLE]

Then

[TABLE]

This shows that $(L^{n}_{\Omega})$ is a congruent family of $n$ -tensors. For $r\leq 1/n$ , observe that by (3.1) we have

[TABLE]

and hence,

[TABLE]

showing the congruency of the family $\tau^{n}_{{\Omega};r}$ as well. ∎

By (2.18) and Definition 3.4, it follows that tensor products and permutations of congruent families of tensors yield again such families. Moreover, since

[TABLE]

multiplying a congruent family with a continuous function depending only on $\|\mu_{r}^{1/r}\|_{{\mathcal{S}}({\Omega})}=\|\mu_{r}^{1/r}\|$ yields again a congruent family of tensors. Therefore, defining for a partition ${\bf P}\in{\bf Part}(n)$ with the associated map $\pi_{\bf P}$ from (2.13) the tensor $\tau^{\bf P}\in\otimes^{n}V^{\ast}$ as

[TABLE]

this together with Proposition 2.2 yields the following.

Proposition 3.3.

For $r\in(0,1]$ ,

[TABLE]

is a congruent family of $n$ -tensor fields on ${\mathcal{M}}^{r}({\Omega})$ , where the sum is taken over all partitions ${\bf P}=\{P_{1},\ldots,P_{l}\}\in{\bf Part}(n)$ with $|P_{i}|\leq 1/r$ for all $i$ , and where $a_{\bf P}:(0,\infty)\to{\mathbb{R}}$ are continuous functions. Furthermore,

[TABLE]

is a congruent family of $n$ -tensor fields on ${\mathcal{P}}^{r}({\Omega})$ , where the sum is taken over all partitions ${\bf P}=\{P_{1},\ldots,P_{l}\}\in{\bf Part}(n)$ with $1<|P_{i}|\leq 1/r$ for all $i$ , and where the $c_{\bf P}\in{\mathbb{R}}$ are constants.

In the light of Proposition 2.2, it is reasonable to use the following terminology.

Definition 3.5.

The congruent families of $n$ -tensors on ${\mathcal{M}}^{r}({\Omega})$ and ${\mathcal{P}}^{r}({\Omega})$ given in (3.7) and (3.8), respectively, are called the families which are algebraically generated by the canonical tensors.

4. Congruent families on finite sample spaces

In this section, we wish to apply our discussion of the previous sections to the case where the sample space ${\Omega}$ is assumed to be a finite set, in which case it is denoted by $I$ rather than ${\Omega}$ .

The simplification of this case is due to the fact that in this case the spaces ${\mathcal{S}}^{r}(I)$ are finite dimensional. Indeed, we have

[TABLE]

where $\delta_{i}$ denotes the Dirac measure supported at $i\in I$ . The norm on ${\mathcal{S}}(I)$ is then

[TABLE]

The space ${\mathcal{S}}^{r}(I)$ is then given as

[TABLE]

The sets ${\mathcal{M}}_{+}(I)$ and ${\mathcal{P}}_{+}(I)\subset{\mathcal{S}}(I)$ are manifolds of dimension $|I|$ and $|I|-1$ , respectively. Indeed, ${\mathcal{M}}_{+}(I)\subset{\mathcal{S}}(I)$ is an open subset, whereas ${\mathcal{P}}_{+}(I)$ is an open subset of the affine hyperplane ${\mathcal{S}}_{1}(I)$ , cf (2.1). In particular, we have

[TABLE]

The norm on ${\mathcal{S}}^{r}(I)$ is given as

[TABLE]

and the product $\cdot$ and the exponentiating map $\pi^{k}:{\mathcal{S}}^{r}(I)\to{\mathcal{S}}^{kr}(I)$ from above are given as

[TABLE]

Evidently, $\pi^{k}$ maps ${\mathcal{M}}_{+}^{r}(I)$ and ${\mathcal{P}}_{+}^{r}(I)$ to ${\mathcal{M}}_{+}^{kr}(I)$ and ${\mathcal{P}}_{+}^{kr}(I)$ , respectively, and the restriction of $\pi^{k}$ to these sets is differentiable even if $k<1$ .

A Markov kernel between the finite sets $I=\{1,\ldots,m\}$ and $I^{\prime}=\{1,\ldots,n\}$ is determined by the $(n\times m)$ -Matrix $(K^{i}_{i^{\prime}})_{i,i^{\prime}}$ by

[TABLE]

where $K_{i^{\prime}}^{i}\geq 0$ and $\sum_{i^{\prime}}K_{i^{\prime}}^{i}=1$ for all $i\in I$ . Therefore, by linearity,

[TABLE]

In particular, $K_{\ast}({\mathcal{P}}_{+}(I))\subset K_{\ast}({\mathcal{P}}_{+}(I^{\prime}))$ and $K_{\ast}({\mathcal{M}}_{+}(I))\subset K_{\ast}({\mathcal{M}}_{+}(I^{\prime}))$ .

If $\kappa:I^{\prime}\to I$ is a statistic between finite sets (cf. Example 2.2) and if we denote the induced partition by $A_{i}:=\kappa^{-1}(i)\subset I^{\prime}$ , then a Markov kernel $K:I\to{\mathcal{P}}(I^{\prime})$ given by the matrix $(K^{i}_{i^{\prime}})_{i,i^{\prime}}$ as above is $\kappa$ -congruent if and only if

[TABLE]

Since $(\delta_{i}^{r})_{i\in I}$ is a basis of ${\mathcal{S}}^{r}({\Omega})$ , we can describe any $n$ -tensor $\Psi^{n}$ on ${\mathcal{S}}^{r}(I)$ by defining for all multiindices $\vec{i}:=(i_{1},\ldots,i_{n})\in I^{n}$ the component functions

[TABLE]

which are real valued functions depending continuously on $\mu_{r}\in{\mathcal{S}}^{r}(I)$ . Thus, by (2.20), the component functions of the canonical tensor $\tau^{n}_{{\Omega};r}$ from (4.4) are given as

[TABLE]

Remark 4.1.

Observe that $\theta^{\vec{i}}_{I;r}$ is continuous on ${\mathcal{M}}_{+}^{r}(I)$ and hence $\tau^{n}_{I;r}=(\pi^{1/nr})^{\ast}L^{n}_{I}$ is well-defined on ${\mathcal{M}}^{r}_{+}(I)$ even if $r>1/n$ , as on this set $m_{i}>0$ . This reflects the fact that the restriction $\pi^{1/nr}:{\mathcal{M}}^{r}_{+}(I)\to{\mathcal{S}}^{1/n}(I)$ is differentiable for any $r>0$ by (4.3).

In particular, for $r=1$ , when restricting to ${\mathcal{M}}_{+}(I)$ or ${\mathcal{P}}_{+}(I)$ , the canonical tensor fields

[TABLE]

yield a congruent family of $n$ -tensors on ${\mathcal{M}}_{+}(I)$ and ${\mathcal{P}}_{+}(I)$ , respectively, as is verified as in the proof of Proposition 3.2. Therefore, the families of $n$ -tensor fields

[TABLE]

on ${\mathcal{M}}_{+}(I)$ and

[TABLE]

on ${\mathcal{P}}_{+}(I)$ are congruent, where in contrast to (3.7) and (3.8) we need not restrict the sum to partitions with $|P_{i}|\leq 1/r$ for all $i$ . In analogy to Definition 3.5 we call these the families of congruent tensors algebraically generated by the canonical $n$ -tensors $\{\tau_{I}^{n}\}$ .

The main result of this section (Theorem 4.1) will be that (4.6) and (4.7) are the only families of congruent $n$ -tensor fields which are defined on ${\mathcal{M}}_{+}(I)$ and ${\mathcal{P}}_{+}(I)$ , respectively, for all *finite *sets $I$ . In order to do this, we first deal with congruent families on ${\mathcal{M}}_{+}(I)$ only.

A multiindex $\vec{i}=(i_{1},\ldots,i_{n})\in I^{n}$ induces a partition ${\bf P}(\vec{i})$ of the set $\{1,\ldots,n\}$ into the equivalence classes of the relation $k\sim l\Leftrightarrow i_{k}=i_{l}$ . For instance, for $n=6$ and pairwise distinct elements $i,j,k\in I$ , the partition induced by $\vec{i}:=(j,i,i,k,j,i)$ is

[TABLE]

Since the canonical $n$ -tensors $\tau^{n}_{I}$ are symmetric by definition, it follows that for any partition ${\bf P}\in{\bf Part}(n)$ we have by (3.6)

[TABLE]

Lemma 4.1.

In (4.6) and (4.7) above, $a_{\bf P}:(0,\infty)\to{\mathbb{R}}$ and $c_{\bf P}$ are uniquely determined.

Proof.

To show the first statement, let us assume that there are functions $a_{\bf P}:(0,\infty)\to{\mathbb{R}}$ such that

[TABLE]

for all finite sets $I$ and $\mu\in{\mathcal{M}}_{+}(I)$ , but there is a partition ${\bf P}_{0}$ with $a_{{\bf P}_{0}}\not\equiv 0$ . In fact, we pick ${\bf P}_{0}$ to be minimal with this property, and choose a multiindex $\vec{i}\in I^{n}$ with ${\bf P}(\vec{i})={\bf P}_{0}$ . Then

[TABLE]

where the last equation follows since $a_{\bf P}\equiv 0$ for ${\bf P}<{\bf P}_{0}$ by the minimality assumption on ${\bf P}_{0}$ .

But $(\tau^{{\bf P}_{0}}_{I})_{\mu}(\delta_{\vec{i}})\neq 0$ again by (4.8), since ${\bf P}(\vec{i})={\bf P}_{0}$ , so that $a_{{\bf P}_{0}}(\|\mu\|)=0$ for all $\mu$ , contradicting $a_{{\bf P}_{0}}\not\equiv 0$ .

Thus, (4.9) occurs only if $a_{\bf P}\equiv 0$ for all ${\bf P}$ , showing the uniqueness of the functions $a_{\bf P}$ in (4.6).

The uniqueness of the constants $c_{\bf P}$ in (4.7) follows similarly, but we have to account for the fact that $\delta_{i}\notin{\mathcal{S}}_{0}(I)=T_{\mu}{\mathcal{P}}_{+}(I)$ . In order to get around this, let $I$ be a finite set and $J:=\{0,1,2\}\times I$ . For $i\in I$ , we define

[TABLE]

and for a multiindex $\vec{i}=(i_{1},\ldots,i_{n})\in I^{n}$ we let

[TABLE]

Multiplying this term out, we see that $(\tau^{\bf P}_{J})_{\mu}(V^{\vec{i}})$ is a linear combination of terms of the form $(\tau^{\bf P}_{J})_{\mu}(\delta_{(a_{1},i_{1})},\ldots,\delta_{(a_{n},i_{n})})$ , where $a_{i}\in\{0,1,2\}$ . Thus, from (4.8) we conclude that

[TABLE]

Moreover, if ${\bf P}(\vec{i})=\{P_{1},\ldots,P_{r}\}$ with $|P_{i}|=k_{i}$ , and $\mu_{0}:=1/|J|\sum\delta_{(a,i)}\in{\mathcal{P}}_{+}(J)$ , then

[TABLE]

Thus, by (2.15) we have

[TABLE]

In particular, since $2^{k_{i}}+2(-1)^{k_{i}}>0$ for all $k_{i}\geq 2$ we conclude that

[TABLE]

as long as ${\bf P}(\vec{i})$ does not contain singleton set.

With this, we can now proceed as in the previous case: assume that

[TABLE]

for constants $c_{\bf P}$ which do not all vanish, and we let ${\bf P}_{0}$ be minimal with $c_{{\bf P}_{0}}\neq 0$ . Let $\vec{i}=(i_{1},\ldots,i_{n})\in I^{n}$ be a multiindex with ${\bf P}(\vec{i})={\bf P}_{0}$ , and let $J:=\{0,1,2\}\times I$ be as above. Then

[TABLE]

where the last equality follows by the assumption that ${\bf P}_{0}$ is minimal. But $(\tau^{{\bf P}_{0}}_{J})_{\mu}(V^{\vec{i}})\neq 0$ by (4.11), whence $c_{{\bf P}_{0}}=0$ , contradicting the choice of ${\bf P}_{0}$ .

This shows that (4.12) can happen only if all $c_{\bf P}=0$ , and this completes the proof. ∎

The main result of this section is the following.

Theorem 4.1.

(Classification of congruent families of $n$ -tensors)

The class of congruent families of $n$ -tensors on ${\mathcal{M}}_{+}(I)$ and ${\mathcal{P}}_{+}(I)$ , respectively, for finite sets $I$ is the class algebraically generated by the canonical $n$ -tensors $\{\tau_{I}^{n}\}$ . That is, these families are the ones given in (4.6) and (4.7), respectively.

The rest of this section will be devoted to its proof which is split up into several lemmas.

Lemma 4.2.

Let $\tau^{\bf P}_{I}$ be the canonical $n$ -tensor from Definition 3.6, and define the center

[TABLE]

Then for any $\lambda>0$ we have

[TABLE]

Proof.

For $\mu=\lambda c_{I}$ , $\lambda>0$ , the components $\mu_{i}$ of $\mu$ all equal $\mu_{i}=\lambda/|I|$ , whence in this case we have for all multiindices $\vec{i}$ with ${\bf P}\leq{\bf P}(\vec{i})$

[TABLE]

showing (4.14). If ${\bf P}\not\leq{\bf P}(\vec{i})$ , the claim follows from (4.8). ∎

Now let us suppose that $\{\tilde{\Theta}^{n}_{I}\>:\;I\;\mbox{finite}\}$ is a congruent family of $n$ -tensors on ${\mathcal{M}}_{+}(I)$ , and define $\theta^{\vec{i}}_{I,\mu}$ as in (4.4) and $c_{I}\in{\mathcal{P}}_{+}(I)$ as in (4.13).

Lemma 4.3.

Let $\{\tilde{\Theta}^{n}_{I}\>:\;I\;\mbox{finite}\}$ and $\theta^{\vec{i}}_{I,\mu}$ be as before, and let $\lambda>0$ . If $\vec{i},\vec{j}\in I^{n}$ are multiindices with ${\bf P}(\vec{i})={\bf P}(\vec{j})$ , then

[TABLE]

Proof.

If ${\bf P}(\vec{i})={\bf P}(\vec{j})$ , then there is a permutation $\sigma:I\to I$ such that $\sigma(i_{k})=j_{k}$ for $k=1,\ldots,n$ . We define the congruent Markov kernel $K:I\to{\mathcal{P}}(I)$ by $K^{i}:=\delta_{\sigma(i)}$ . Then evidently, $K_{*}c_{I}=c_{I}$ , and Definition 3.4 implies

[TABLE]

which shows the claim. ∎

By virtue of this lemma, we may define

[TABLE]

Lemma 4.4.

Let $\{\tilde{\Theta}^{n}_{I}\>:\;I\;\mbox{finite}\}$ and $\theta^{\bf P}_{I,\lambda c_{I}}$ be as before, and suppose that ${\bf P}_{0}\in\mbox{\bf Part}(n)$ is a partition such that

[TABLE]

Then there is a continuous function $f_{{\bf P}_{0}}:(0,\infty)\to{\mathbb{R}}$ such that

[TABLE]

Proof.

Let $I,J$ be finite sets, and let $I^{\prime}:=I\times J$ . We define the Markov kernel

[TABLE]

which is congruent w.r.t. the canonical projecton $\kappa:I^{\prime}\to I$ . Then $K_{*}c_{I}=c_{I^{\prime}}$ is easily verified. Moreover, if $\vec{i}=(i_{1},\ldots,i_{n})\in I^{n}$ is a multiindex with ${\bf P}(\vec{i})={\bf P}_{0}$ , then

[TABLE]

Observe that ${\bf P}((i_{1},j_{1}),\ldots,(i_{n},j_{n}))\leq{\bf P}(\vec{i})={\bf P}_{0}$ . If ${\bf P}((i_{1},j_{1}),\ldots,(i_{n},j_{n}))<{\bf P}_{0}$ , then $\theta^{{\bf P}((i_{1},j_{1}),\ldots,(i_{n},j_{n}))}_{I^{\prime},\lambda c_{I^{\prime}}}=0$ by (4.15).

Moreover, there are $|J|^{|{\bf P}_{0}|}$ multiindices $(j_{1},\ldots,j_{n})\in J^{n}$ for which ${\bf P}((i_{1},j_{1}),\ldots,(i_{n},j_{n}))={\bf P}_{0}$ , and since for all of these $\theta^{{\bf P}((i_{1},j_{1}),\ldots,(i_{n},j_{n}))}_{I^{\prime},\lambda c_{I^{\prime}}}=\theta^{{\bf P}_{0}}_{I^{\prime},\lambda c_{I^{\prime}}}$ , we obtain

[TABLE]

and since $|I^{\prime}|=|I|\;|J|$ , it follows that

[TABLE]

Interchanging the roles of $I$ and $J$ in the previous arguments, we also get

[TABLE]

whence $f_{{\bf P}_{0}}(\lambda):=\frac{1}{|I|^{n-|{\bf P}_{0}|}}\theta^{{\bf P}_{0}}_{I,\lambda c_{I}}$ is indeed independent of the choice of the finite set $I$ . ∎

Lemma 4.5.

Let $\{\tilde{\Theta}^{n}_{I}\>:\;I\;\mbox{finite}\}$ and $\lambda>0$ be as before. Then there is a congruent family $\{\tilde{\Psi}^{n}_{I}\>:\;I\;\mbox{finite}\}$ of the form (4.6) such that

[TABLE]

Proof.

For a congruent family of $n$ -tensors $\{\tilde{\Theta}^{n}_{I}\>:\;I\;\mbox{finite}\}$ , we define

[TABLE]

If $N(\{\tilde{\Theta}^{n}_{I}\})\subsetneq\mbox{\bf Part}(n)$ , then let

[TABLE]

be a minimal element, i.e., such that ${\bf P}\in N(\{\tilde{\Theta}^{n}_{I}\})$ for all ${\bf P}<{\bf P}_{0}$ . In particular, for this partition (4.15) and hence (4.16) holds. Let

[TABLE]

with the function $f_{{\bf P}_{0}}$ from (4.16). Then $\{{\tilde{\Theta^{\prime}}}^{n}_{I}\;:\;I\;\mbox{finite}\}$ is again a family of $n$ -tensors.

Let ${\bf P}\in N(\{\tilde{\Theta}^{n}_{I}\})$ and $\vec{i}$ be a multiindex with ${\bf P}(\vec{i})\leq{\bf P}$ . If $(\tau^{{\bf P}_{0}}_{I})_{\lambda c_{I}}(\delta_{\vec{i}})\neq 0$ , then by Lemma 4.2 we would have ${\bf P}_{0}\leq{\bf P}(\vec{i})\leq{\bf P}\in N(\{\tilde{\Theta}^{n}_{I}\})$ which would imply that ${\bf P}_{0}\in N(\{\tilde{\Theta}^{n}_{I}\})$ , contradicting the choice of ${\bf P}_{0}$ .

Thus, $(\tau^{{\bf P}_{0}}_{I})_{\lambda c_{I}}(\delta_{\vec{i}})=0$ and hence $({\tilde{\Theta^{\prime}}}^{n}_{I})_{\lambda c_{I}}(\delta_{\vec{i}})=0$ whenever ${\bf P}(\vec{i})\leq{\bf P}$ , showing that ${\bf P}\in N(\{{\tilde{\Theta^{\prime}}}^{n}_{I}\})$ .

Thus, what we have shown is that $N(\{\tilde{\Theta}^{n}_{I}\})\subset N(\{{\tilde{\Theta^{\prime}}}^{n}_{I}\})$ . On the other hand, if ${\bf P}(\vec{i})={\bf P}_{0}$ , then again by Lemma 4.2

[TABLE]

and since $\|\lambda c_{I}\|=\lambda$ , it follows that

[TABLE]

That is, $({\tilde{\Theta^{\prime}}}^{n}_{I})_{\lambda c_{I}}(\delta_{\vec{i}})=0$ whenever ${\bf P}(\vec{i})={\bf P}_{0}$ . If $\vec{i}$ is a multiindex with ${\bf P}(\vec{i})<{\bf P}_{0}$ , then ${\bf P}(\vec{i})\in N(\{{\tilde{\Theta^{\prime}}}^{n}_{I}\})$ by the minimality of ${\bf P}_{0}$ , so that $\tilde{\Theta}^{n}_{I}(\delta_{\vec{i}})=0$ . Moreover, $(\tau^{{\bf P}_{0}}_{I})_{\lambda c_{I}}(\delta_{\vec{i}})=0$ by Lemma 4.2, whence

[TABLE]

showing that ${\bf P}_{0}\in N(\{{\tilde{\Theta^{\prime}}}^{n}_{I}\})$ . Therefore,

[TABLE]

What we have shown is that given a congruent family of $n$ -tensors $\{\tilde{\Theta}^{n}_{I}\}$ with $N(\{\tilde{\Theta}^{n}_{I}\})\subsetneq\mbox{\bf Part}(n)$ , we can enlarge $N(\{\tilde{\Theta}^{n}_{I}\})$ by subtracting a multiple of the canonical tensor of some partition. Repeating this finitely many times, we conclude that for some congruent family $\{\tilde{\Psi}^{n}_{I}\}$ of the form (4.6)

[TABLE]

and this implies by definition that $(\tilde{\Theta}^{n}_{I}-\tilde{\Psi}^{n}_{I})_{\lambda c_{I}}=0$ for all $I$ and all $\lambda>0$ . ∎

Lemma 4.6.

Let $\{\tilde{\Theta}^{n}_{I}\;:\;I\;\mbox{finite}\}$ be a congruent family of $n$ -tensors such that $(\tilde{\Theta}^{n}_{I})_{\lambda c_{I}}=0$ for all $I$ and $\lambda>0$ . Then $\tilde{\Theta}^{n}_{I}=0$ for all $I$ .

Proof.

Consider $\mu\in{\mathcal{M}}_{+}(I)$ such that $\pi_{I}(\mu)=\mu/\|\mu\|\in{\mathcal{P}}_{+}(I)$ has rational coefficients, i.e.

[TABLE]

for some $k_{i},n\in{\mathbb{N}}$ and $\sum_{i\in I}k_{i}=n$ . Let

[TABLE]

so that $|I^{\prime}|=n$ , and consider the congruent Markov kernel

[TABLE]

Then

[TABLE]

Thus, Definition 3.4 implies

[TABLE]

so that $(\tilde{\Theta}^{n}_{I})_{\mu}=0$ whenever $\pi_{I}(\mu)$ has rational coefficients. But these $\mu$ form a dense subset of ${\mathcal{M}}_{+}(I)$ , whence $(\tilde{\Theta}^{n}_{I})_{\mu}=0$ for all $\mu\in{\mathcal{M}}_{+}(I)$ , which completes the proof. ∎

We are now ready to prove the main result in this section.

Proof of Theorem 4.1.

Let $\{\tilde{\Theta}^{n}_{I}:I\mbox{ finite}\}$ be a congruent family of $n$ -tensors. By Lemma 4.5 there is a congruent family $\{\tilde{\Psi}^{n}_{I}:I\mbox{ finite}\}$ of the form (4.6) such that $(\tilde{\Theta}^{n}_{I}-\tilde{\Psi}^{n}_{I})_{\lambda c_{I}}=0$ for all finite $I$ and all $\lambda>0$ .

Since $\{\tilde{\Theta}^{n}_{I}-\tilde{\Psi}^{n}_{I}:I\mbox{ finite}\}$ is again a congruent family, Lemma 4.6 implies that $\tilde{\Theta}^{n}_{I}-\tilde{\Psi}^{n}_{I}=0$ and hence $\tilde{\Theta}^{n}_{I}=\tilde{\Psi}^{n}_{I}$ is of the form (4.6), showing the statement of Theorem 4.1 for $n$ -tensors on ${\mathcal{M}}_{+}(I)$ .

To show the second part, let us consider for a finite set $I$ the inclusion and projection

[TABLE]

Evidently, $\pi_{I}$ is a left inverse of $\imath_{I}$ , i.e., $\pi_{I}\imath_{I}=Id_{{\mathcal{P}}_{+}(I)}$ , and by (2.9) it follows that $K_{\ast}$ commutes both with $\pi_{I}$ and $\imath_{I}$ .

Thus, if $\{\Theta^{n}_{I}:I\mbox{ finite}\}$ is a congruent family of $n$ -tensors on ${\mathcal{P}}_{+}(I)$ , then

[TABLE]

yields a congruent families of $n$ -tensors on ${\mathcal{M}}_{+}(I)$ and by the first part of the theorem must be of the form (4.6). But then,

[TABLE]

where $c_{\bf P}=a_{\bf P}(1)$ . Since $(\tau^{n}_{I})|_{{\mathcal{P}}_{+}(I)}=0$ if ${\bf P}$ contains a singleton set, it follows that $\Theta^{n}_{I}$ is of the form (4.7). ∎

5. Congruent families on arbitrary sample spaces

In this section, we wish to generalize the classification result for congruent families on finite sample spaces (Theorem 4.1) to the case of arbitrary sample spaces. As it turns out, we show that even in this case, congruent families of tensor fields are algebraically generated by the canonical tensor fileds. More precisely, we have the following result.

Theorem 5.1 (Classification of congruent families).

For $0<r\leq 1$ , let $(\Theta^{n}_{{\Omega};r})$ be a family of covariant $n$ -tensors on ${\mathcal{M}}^{r}({\Omega})$ (on ${\mathcal{P}}^{r}({\Omega})$ , respectively) for each measurable space ${\Omega}$ . Then the following are equivalent:

(1)

$(\Theta^{n}_{{\Omega};r})$ * is a congruent family of covariant $n$ -tensors of regularity $r$ .* 2. (2)

For each congruent Markov morphism $K:I\to{\mathcal{P}}({\Omega})$ for a finite set $I$ , we have $K_{r}^{*}\Theta^{n}_{{\Omega};r}=\Theta^{n}_{I;r}$ . 3. (3)

$\Theta^{n}_{{\Omega};r}$ * is of the form (3.7) (of the form (3.8), respectively) for uniquely determined continuous functions $a_{\bf P}$ (constants $c_{\bf P}$ , respectively).*

In the light of Definition 3.5, we may reformulate the equivalence of the first and the third statement as follows:

Corollary 5.1.

The space of congruent families of covariant $n$ -tensors on ${\mathcal{M}}^{r}({\Omega})$ and ${\mathcal{P}}^{r}({\Omega})$ , respectively, is algebraically generated by the canonical $n$ -tensors $\tau^{n}_{{\Omega};r}$ for $n\leq 1/r$ .

Proof of Theorem 5.1..

We already showed in Proposition 3.3 that the tensors (3.7) and (3.8), respectively, are congruent families, hence the third statement implies the first. The first immediately implies the second by the definition of the congruency of tensors. Thus, it remains to show that the second statement implies the third.

We shall give the proof only for the families $(\Theta_{{\Omega};r}^{n})$ of covariant $n$ -tensors on ${\mathcal{M}}^{r}({\Omega})$ , as the proof for families on ${\mathcal{P}}^{r}({\Omega})$ is analogous.

Observe that for finite sets $I$ , the space ${\mathcal{M}}^{r}_{+}(I)\subset{\mathcal{S}}^{r}(I)$ is an open subset and hence a manifold, and the restrictions $\pi^{\alpha}:{\mathcal{M}}^{r}_{+}(I)\to{\mathcal{M}}_{+}^{r\alpha}(I)$ are diffeomorphisms not only for $\alpha\geq 1$ but for all $\alpha>0$ . Thus, given the congruent family $(\Theta^{n}_{{\Omega};r})$ , we define for each finite set $I$ the tensor

[TABLE]

Then for each congruent Markov kernel $K:I\to{\mathcal{P}}(J)$ with $I$ , $J$ finite we have

[TABLE]

Thus, the family $(\Theta^{n}_{I})$ on ${\mathcal{M}}_{+}(I)$ is a congruent family of covariant $n$ -tensors on finite sets, whence by Theorem 4.1

[TABLE]

for uniquely determined functions $a_{\bf P}$ , whence on ${\mathcal{M}}_{+}^{r}(I)$ ,

[TABLE]

By our assumption, $\Theta^{n}_{I;r}$ must be a covariant $n$ -tensor on ${\mathcal{M}}(I)$ , whence it must extend continuously to the boundary of ${\mathcal{M}}_{+}(I)$ .

But by (4.5) it follows that $\tau^{n_{i}}_{I;r}$ has a singularity at the boundary of ${\mathcal{M}}(I)$ , unless $n_{i}\leq 1/r$ . From this it follows that $\Theta^{n}_{I;r}$ extends to all of ${\mathcal{M}}(I)$ if and only if $a_{\bf P}\equiv 0$ for all partitions ${\bf P}=\{P_{1},\ldots,P_{i}\}$ where $|P_{i}|>1/r$ for some $i$ .

Thus, $\Theta^{n}_{I;r}$ must be of the form (3.7) for all finite sets $I$ . Let

[TABLE]

for the previously determined functions $a_{\bf P}$ , so that $(\Psi^{n}_{{\Omega};r})$ is a congruent family of covariant $n$ -tensors, and $\Psi^{n}_{I;r}=0$ for every finite $I$ .

We assert that this implies that $\Psi^{n}_{{\Omega};r}=0$ for all ${\Omega}$ , which shows that $\Theta^{n}_{{\Omega};r}$ is of the form (3.7) for all ${\Omega}$ , which will complete the proof.

To see this, let $\mu_{r}\in{\mathcal{M}}^{r}({\Omega})$ and $\mu:=\mu_{r}^{1/r}\in{\mathcal{M}}({\Omega})$ . Moreover, let $V_{j}=\phi_{j}\mu_{r}\in{\mathcal{S}}^{r}({\Omega},\mu_{r})$ , $j=1,\ldots,n$ , such that the $\phi_{j}$ are step functions. That is, there is a finite partition ${\Omega}=\dot{\bigcup}_{i\in I}{\Omega}_{i}$ such that

[TABLE]

for $\phi_{j}^{i}\in{\mathbb{R}}$ and $m_{i}:=\mu({\Omega}_{i})>0$ .

Let $\kappa:{\Omega}\to I$ be the statistic $\kappa({\Omega}_{i})=\{i\}$ , and $K:I\to{\mathcal{P}}({\Omega})$ , $K(i):=1/m_{i}\chi_{{\Omega}_{i}}\mu$ . Then clearly, $K$ is $\kappa$ -congruent, and $\mu=K_{*}\mu^{\prime}$ with $\mu^{\prime}:=\sum_{i\in I}m_{i}\delta_{i}\in{\mathcal{M}}_{+}(I)$ . Thus, by (3.3)

[TABLE]

whence if we let $V_{j}^{\prime}:=\sum_{i\in I}\phi_{j}^{i}m_{i}^{r}\delta_{i}^{r}\in{\mathcal{S}}^{r}(I)$ , then

[TABLE]

since by the congruence of the family $(\Psi_{{\Omega};r}^{n})$ we must have $K_{r}^{*}\Psi_{{\Omega};r}^{n}=\Psi_{I;r}^{n}$ , and $\Psi_{I;r}^{n}=0$ by assumption as $I$ is finite.

That is, $\Psi_{{\Omega};r}^{n}(V_{1},\ldots,V_{n})=0$ whenever $V_{j}=\phi_{j}\mu_{r}\in{\mathcal{S}}^{r}({\Omega},\mu_{r})$ with step functions $\phi_{j}$ . But the elements $V_{j}$ of this form are dense in ${\mathcal{S}}^{r}({\Omega},\mu_{r})$ , hence the continuity of $\Psi_{{\Omega};r}^{n}$ implies that $\Psi_{{\Omega};r}^{n}=0$ for all ${\Omega}$ as claimed. ∎

As two special cases of this result, we obtain the following.

Corollary 5.2 (Generalization of Chentsov’s theorem).

(1)

Let $(\Theta_{\Omega}^{2})$ be a congruent family of $2$ -tensors on ${\mathcal{P}}^{1/2}({\Omega})$ . Then up to a constant, this family is the Fisher metric. That is, there is a constant $c\in{\mathbb{R}}$ such that for all ${\Omega}$ ,

[TABLE]

In particular, if $(M,{\Omega},{\mathbf{p}})$ is a $2$ -integrable statistical model, then

[TABLE]

is – up to a constant – the Fisher metric of the model. 2. (2)

Let $(\Theta_{\Omega}^{3})$ be a congruent family of $3$ -tensors on ${\mathcal{P}}^{1/3}({\Omega})$ . Then up to a constant, this family is the Amari–Chentsov tensor. That is, there is a constant $c\in{\mathbb{R}}$ such that for all ${\Omega}$ ,

[TABLE]

In particular, if $(M,{\Omega},{\mathbf{p}})$ is a $3$ -integrable statistical model, then

[TABLE]

is – up to a constant – the Amari–Chentsov tensor of the model.

Corollary 5.3 (Generalization of Campbell’s theorem).

Let $(\Theta_{\Omega}^{2})$ be a congruent family of $2$ -tensors on ${\mathcal{M}}^{1/2}({\Omega})$ . Then there are continuous functions $a,b:(0,\infty)\to{\mathbb{R}}$ such that

[TABLE]

In particular, if $(M,{\Omega},{\mathbf{p}})$ is a $2$ -integrable parametrized measure model, then

[TABLE]

While the above results show that for small $n$ there is a unique family of congruent $n$ -tensors, this is no longer true for larger $n$ . For instance, for $n=4$ Theorem 5.1 implies that any restricted congruent family of invariant $4$ -tensors on ${\mathcal{P}}^{r}({\Omega})$ , $0<r\leq 1/4$ , is of the form

[TABLE]

so that the space of congruent families on ${\mathcal{P}}^{r}({\Omega})$ is already $4$ -dimensional in this case. Evidently, this dimension rapidly increases with $n$ .

Bibliography19

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Amari , Theory of information spaces. A geometrical foundation of statistics. POST RAAG Report 106 (1980).
2[2] S. Amari , Differential geometry of curved exponential families curvature and information loss. The Annals of Statistics, 10, 357-385 (1982).
3[3] S. Amari and H. Nagaoka , Methods of information geometry, Translations of mathematical monographs; v. 191, American Mathematical Society, Providence, RI; Oxford University Press, Oxford (2000).
4[4] N. Ay, J. Jost, H.V. Lê, L. Schwachhöfer , Information geometry and sufficient statistics, Probability Theory and Related Fields 162, 327–364 (2015).
5[5] N. Ay, J. Jost, H.V. Lê, L. Schwachhöfer , Parametrized measure models, Bernoulli (to appear), ar Xiv:1510.07305, (2015).
6[6] N. Ay, J. Jost, H.V. Lê, L. Schwachhöfer , Information geometry, Ergebnisse der Mathematik und ihrer Grenzgebiete, Springer (to appear).
7[7] M. Bauer, M. Bruveris, P. Michor , Uniqueness of the Fisher-Rao metric on the space of smooth densities, Bull.Lond.Math.Soc. 48, no. 3, 499–506 (2016).
8[8] M. Bauer, M. Bruveris, P. Michor , Presentation at the fourth Conference on Information Geometry ind Its Applications (IGAIA IV, 2016), Liblice, Czech Republic.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Congruent families and invariant tensors

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

Theorem 1.1**.**

2. Preliminary results

2.1. The space of (signed) finite measures and their powers

2.2. Parametrized measure models

2.3. Congruent Markov morphisms

Definition 2.1**.**

Example 2.1**.**

Definition 2.2**.**

Example 2.2**.**

Proposition 2.1**.**

2.4. Tensor algebras

Definition 2.3**.**

Example 2.3**.**

Example 2.4**.**

Proposition 2.2**.**

Proof.

2.5. Tensor fields

Definition 2.4**.**

Definition 2.5**.**

Example 2.5**.**

3. Congruent families of tensor fields

Definition 3.1**.**

Definition 3.2**.**

Proposition 3.1**.**

Proof.

Definition 3.3** (Pullback of tensors by a Markov morphism).**

Definition 3.4** (Congruent families of tensors).**

Proposition 3.2**.**

Proof.

Proposition 3.3**.**

Definition 3.5**.**

4. Congruent families on finite sample spaces

Remark 4.1**.**

Lemma 4.1**.**

Proof.

Theorem 4.1**.**

Lemma 4.2**.**

Proof.

Lemma 4.3**.**

Proof.

Lemma 4.4**.**

Proof.

Lemma 4.5**.**

Proof.

Lemma 4.6**.**

Proof.

Proof of Theorem 4.1.

5. Congruent families on arbitrary sample spaces

Theorem 5.1** (Classification of congruent families).**

Corollary 5.1**.**

Proof of Theorem 5.1..

Corollary 5.2** (Generalization of Chentsov’s theorem).**

Corollary 5.3** (Generalization of Campbell’s theorem).**

Theorem 1.1.

Definition 2.1.

Example 2.1.

Definition 2.2.

Example 2.2.

Proposition 2.1.

Definition 2.3.

Example 2.3.

Example 2.4.

Proposition 2.2.

Definition 2.4.

Definition 2.5.

Example 2.5.

Definition 3.1.

Definition 3.2.

Proposition 3.1.

Definition 3.3 (Pullback of tensors by a Markov morphism).

Definition 3.4 (Congruent families of tensors).

Proposition 3.2.

Proposition 3.3.

Definition 3.5.

Remark 4.1.

Lemma 4.1.

Theorem 4.1.

Lemma 4.2.

Lemma 4.3.

Lemma 4.4.

Lemma 4.5.

Lemma 4.6.

Theorem 5.1 (Classification of congruent families).

Corollary 5.1.

Corollary 5.2 (Generalization of Chentsov’s theorem).

Corollary 5.3 (Generalization of Campbell’s theorem).