This paper investigates the properties of the logarithm map in orthant spaces and uses these insights to analyze Frechet means, including their characterization and limiting distributions, advancing understanding in stratified metric spaces.
Contribution
It provides a detailed analysis of the logarithm map in orthant spaces and characterizes Frechet means, including their asymptotic behavior, which was previously not well-understood.
Findings
01
Derived explicit expressions for the logarithm map in orthant spaces.
02
Characterized the Frechet means in stratified spaces.
03
Established the limiting distribution of sample Frechet means.
Abstract
The first part of the paper studies the expression for, and the properties of, the logarithm map on an orthant space, which is a simple stratified space, with the aim of analysing Frechet means of probability measures on such a space. In the second part, we use these results to characterise Frechet means and to derive various of their properties, including the limiting distribution of sample Frechet means.
Equations252
21∫Md(x,x′)2dμ(x′)
21∫Md(x,x′)2dμ(x′)
O(E)={i=1∑mλiuliA∣λi>0}.
O(E)={i=1∑mλiuliA∣λi>0}.
∠x(γ1,γ2)=t→0lim∠x(γ1(t),γ2(t)),
∠x(γ1,γ2)=t→0lim∠x(γ1(t),γ2(t)),
≪w1,w2≫=∥w1∥∥w2∥cos∠x(γ1,γ2),
≪w1,w2≫=∥w1∥∥w2∥cos∠x(γ1,γ2),
ρx(w1,w2)={∥w1∥2+∥w2∥2−2≪w1,w2≫}1/2
ρx(w1,w2)={∥w1∥2+∥w2∥2−2≪w1,w2≫}1/2
\displaystyle\begin{array}[]{rcl}E(\mathop{\boldsymbol{x}}\nolimits_{1},\mathop{\boldsymbol{x}}\nolimits_{2})&=&\left\{E(\mathop{\boldsymbol{x}}\nolimits_{1})\cap E(\mathop{\boldsymbol{x}}\nolimits_{2})\right\}\\
&&\bigcup\,\{e\in E(\mathop{\boldsymbol{x}}\nolimits_{1})\mid e\hbox{ is compatible with }E(\mathop{\boldsymbol{x}}\nolimits_{2})\}\\
&&\bigcup\,\{e\in E(\mathop{\boldsymbol{x}}\nolimits_{2})\mid e\hbox{ is compatible with }E(\mathop{\boldsymbol{x}}\nolimits_{1})\},\end{array}
\displaystyle\begin{array}[]{rcl}E(\mathop{\boldsymbol{x}}\nolimits_{1},\mathop{\boldsymbol{x}}\nolimits_{2})&=&\left\{E(\mathop{\boldsymbol{x}}\nolimits_{1})\cap E(\mathop{\boldsymbol{x}}\nolimits_{2})\right\}\\
&&\bigcup\,\{e\in E(\mathop{\boldsymbol{x}}\nolimits_{1})\mid e\hbox{ is compatible with }E(\mathop{\boldsymbol{x}}\nolimits_{2})\}\\
&&\bigcup\,\{e\in E(\mathop{\boldsymbol{x}}\nolimits_{2})\mid e\hbox{ is compatible with }E(\mathop{\boldsymbol{x}}\nolimits_{1})\},\end{array}
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Full text
The Logarithm Map, its Limits
and Fréchet Means in Orthant Spaces
D. Barden
Girton College, University of Cambridge, Cambridge, CB3 0JG, UK ([email protected]).
H. Le
School of Mathematical Sciences, University of Nottingham, Nottingham, NG7 2RD, UK ([email protected]).
Abstract
The first part of the paper studies the expression for, and the properties of, the logarithm map on an orthant space, which is a simple stratified space, with the aim of analysing Fréchet means of probability measures on such a space. In the second part, we use these results to characterise Fréchet means and to derive various of their properties, including the limiting distribution of sample Fréchet means.
Keywords: Fréchet mean; limiting distribution of sample Fréchet means; logarithm map; stratified space.
AMS MSC 2010: 60B05; 60B10.
1 Introduction
Several papers have recently appeared concerning probabilistic and statistical analysis of data on certain stratified spaces (cf. [5], [2], [10], [1] and [11]). One such example is the analysis of phylogenetic trees on the BHV space introduced in [5] (cf. [9], [19], [17], [3], [12], [15] and [18]). The BHV space Tm+2 of metric trees with m+2 leaves is a stratified CAT(0)-space with each stratum being isometric with a positive Euclidean orthant that is at most m-dimensional. It is already clear from these preliminary results that some fundamental statistics exhibit strikingly different features from the corresponding ones on Euclidean spaces or on manifolds and that one faces significant challenges in developing novel tools to analyse them, on account of the non-trivial topological structure of these spaces. It also becomes apparent that, although the topological and geometrical properties of stratified spaces have been extensively studied and are mostly well understood, many of the properties required for probabilistic and statistical analysis of data on these spaces have not been addressed.
This paper concentrates on orthant spaces introduced in [15], a relatively simple type of stratified space but more general than the space Tm+2 of phylogenetic trees. The latter has (2m+1)!!m-dimensional strata, together with their bounding strata, selected from among the (Mm) positive orthants in RM where M=2m+2−m−4. In particular, each co-dimension one stratum bounds exactly three top-dimensional strata. Thus not only are the relevant dimensions sparse, but the percentage of the positive orthants occupied by the tree space of each dimension declines exponentially. These constraints, such as the restrictions on the dimension and the number of orthants involved in the space, no longer hold in a general orthant space, although we do have to make one restriction to ensure that it is a CAT(0)-space. We shall recall, in the next section, the concept of an orthant space, introducing the subsidiary concepts and definitions we use to describe the structure of such spaces and, in particular, of their tangent cones at the various points.
A fundamental concept for statistical analysis of non-Euclidean data is that of the Fréchet mean, which generalises the concept of the mean of Euclidean data. A point x0 in a metric space M is a Fréchet mean of a probability measure μ on M if, at x0, the Fréchet function of μ defined by
[TABLE]
attains its global minimum. In order to characterise and locate Fréchet means, we need to take directional derivatives of the Fréchet function and hence, implicitly, of the distance function. The latter involves the logarithm map logx∗(x) which, analogous to the inverse of the exponential map on manifolds, is the initial tangent vector to the geodesic from x∗ to x. This logarithm map is globally well-defined on CAT(0)-spaces and has been studied, for example, in [14] and [16]. However, these results do not cover all the properties required for our analysis, although naturally we do rely on some of their results. On the other hand, an algorithm for finding the geodesic between any two given trees in the tree space Tm+2 was given in [19] and, using the analysis behind that algorithm, the expression for the logarithm map logx∗ was obtained in [3] when x∗ lies in a top-dimensional stratum. Although this expression for logx∗ could be extended to more general orthant spaces, it is noted in [3] that these results are not adequate to provide a tool for analysing Fréchet means when they lie in any stratum of co-dimension at least two. The latter requires a better understanding of the behaviour of the logarithm map as the end points of the geodesics move within and between strata. To this end, we first re-examine geodesics directly from first principles in Section 3, in particular avoiding the implicit assumption that x∗ lies in a top-dimensional stratum. This leads, in Section 4, to an explicit expression, given in Theorem 1, for a version of the logarithm map that we shall use, valid for any point in an orthant space. Since the form of this expression is determined by the carrier of the geodesic, we analyse possible changes in that carrier, focussing on the set, specified in Definition 11, of points x at which significant changes occur. This allows us, in Section 5, to derive the directional limits of the logarithm map as the reference point x′ approaches x∗ from a co-bounding stratum. We also study the projections of these limits, and the limits of the projections, onto the various strata related to the stratum in which x∗ lies. This enables us to prove the existence of, and to identify, certain of their derivatives and directional derivatives.
With this understanding of the logarithm map, the second part of the paper turns its attention to the analysis of Fréchet means. In Section 6 we obtain, in Theorem 3, the necessary and sufficient conditions for a point x∗ to be the Fréchet mean of a probability measure on the orthant space Xm. Two special sets arise in this analysis. Firstly, one of the criteria in Theorem 3 involves an inequality and the set, specified in Definition 12, of vectors in the tangent cone to Xm at the Fréchet mean for which that is an equality is significant. Secondly, there is the set given by Definition 13. This is related to a limit of the logarithm map and, in a certain sense, encapsulates the ‘departure’ of this limit from the analogous behaviour of the logarithm map on a Euclidean space. Both of these sets are related to the limiting distribution of sample Fréchet means, which we establish in the final Section 7. There, in particular, we relate the limiting distribution with Euclidean Gaussian random variables. The covariance matrices of these random variables are related to the derivative of the projection of the logarithm map and to projections of the limits of the logarithm map.
Although we do not make it explicit, in view of our previous results for Tm+2 and the comments in [15], our interest in this paper is primarily in the case that x∗ lies in a stratum of local co-dimension at least two. The results, when restricted to a locally top-dimensional or co-dimension one stratum, do generalise those for tree spaces in [3] although the approach here is necessarily more complex in order to encompass all cases.
2 Orthant spaces
Throughout this paper, we shall use the term ‘positive’ to mean strictly positive. By an open positive orthant in the Euclidean space RM we shall mean, for some subset E=(ul1A,⋯,ulmA) of the standard ordered orthonormal basis U=(u1,⋯,uMA) of RM, the relatively open set
[TABLE]
We denote by R(E) the subspace spanned by E, and we shall refer to the uliA∈E as the axes of R(E) or of O(E). Then, an orthant space is a union of open positive orthants in a common Euclidean space with certain natural constraints, as specified in the following definition, that ensure, for example, that such spaces are also CAT(0). Orthant spaces were first introduced in [15] as a generalisation of the tree spaces of [5].
Definition 1**.**
For two given integers M⩾m, an orthant space Xm of dimension m is a subspace of the Euclidean space RM that is a union of open positive orthants, whose maximum dimension is m, and has the intrinsic metric induced from the Euclidean metric on RM. It satisfies the following conditions:
(i)
for every orthant σ in Xm, the orthants in the closure σ of σ are also included in Xm;
(ii)
if, for any positive orthant σ in RM, all the 2-dimensional orthants in its closure are in Xm, then σ itself is in Xm.
The intrinsic metric on Xm is the length metric as defined in [6]. It is the metric d for which, for any two points x1 and x2 in Xm, the distance d(x1,x2) is the infimum of the lengths of piecewise linear paths in Xm joining x1 to x2. In particular, a geodesic will also be piecewise linear and linear within each stratum.
Note that there is no loss of generality in restricting Xm to contain only positive orthants: given two orthants that differ only in having positive or negative coordinates with respect to one particular axis, the intrinsic metric will be the same as it would be if we replace, say the negative axis, by an axis orthogonal to RM. Thus, rather than considering Xm to be a union of arbitrary orthants in RM, we could consider it to be a union of positive orthants in R2M. Henceforth, we shall assume all our orthants to be open and positive, mentioning their closure explicitly where that is relevant.
The first condition in the above definition correlates with the constraints used in the definition for orthant space in [15] and the second one restricts attention to the ‘non-positively curved’ orthant spaces in [15] (Proposition 6.10). These two conditions were first used by the authors of [5] to ensure the CAT(0)-property for tree spaces.
Throughout the rest of the paper, Xm will denote an orthant space of fixed dimension m viewed as comprising strata that are orthants of a fixed Euclidean space RM, where M is not necessarily 2m+2−m−4 as it would be for tree space. Also, whenever we specify an orthant by a union of subsets of the standard orthonormal basis U of RM, that will always be intended as a union of mutually disjoint subsets.
The orthant space Xm so defined is a Whitney stratified set in the sense of Thom, [21], the strata being the various orthants that comprise Xm. Note that, since Xm is a union of orthants in a fixed Euclidean space RM, the number of strata in Xm is always finite. Xm has the structure of a cone with vertex, or ‘cone point’, the origin o in RM, since each orthant is such a cone without its vertex, but that vertex, the origin, is necessarily included in Xm. In particular, {o} is the unique zero-dimensional stratum in Xm. Note however that our relatively open strata differ from those in [6].
The CAT(0)-property of the orthant space Xm results as follows, where all the references are to [6]. The intersection L of Xm with the unit sphere in RM is a simplicial complex on account of condition (i) and, since the axes in RM are orthogonal, it is an ‘all-right spherical complex’ (Section 7A.10) which, on account of condition (ii), is a ‘flag complex’. Then, by a theorem of Gromov (Theorem 5.18), L is a CAT(1)-space. The metric on Xm implied by describing it as the [math]-cone over L (Definition 5.6) is the intrinsic metric so that, by the theorem of Berestowski (Theorem 3.14), Xm is CAT(0).
In particular, by the Cartan-Hadamard theorem (cf. [6], p.193), there is a unique geodesic between any two points of the orthant space Xm. It follows that each stratum is totally geodesic in the strong sense that, if a geodesic contains two points of a stratum, it must include the entire linear segment in that stratum determined by those two points. On the other hand, although the distance metric for the CAT(0)-structure is induced from the Euclidean metric, the angles along and between curves may differ for the two contexts. For example, a geodesic, defined as a shortest path between its endpoints in either context, will be a piecewise linear curve in RM, linear in each stratum, with angle π/2 in the Euclidean subspace metric where it changes stratum. However, for the CAT(0)-structure, that angle is defined to be π.
The properties of an orthant space are largely determined by the incidence relations between its various strata. The following definitions capture two such relationships that will be used frequently in the paper.
Definition 2**.**
For subsets E and F of the standard orthonormal basis U=(u1,⋯,uMA) of RM, if E⊆F, then the orthant O(E) is said to bound O(F) and O(F) to co-bound O(E).
Note that, unlike the case for tree spaces, strata of lower dimension than m need not bound any higher dimensional strata, in particular they need not bound m-dimensional strata.
Definition 3**.**
An orthant σ of dimension k in Xm is said to have co-dimension m−k and, if m′(⩽m) is the maximum dimension of orthants that σ co-bounds, then σ is said to have local co-dimension m′−k.
The tangent cone
It is natural for our purposes to follow [6] and to define the tangent cone to Xm at a point x to consist of all initial tangent vectors to smooth curves starting from x, the smoothness possibly only being one-sided at x. Note, however, that this is not the same as the generalised tangent space of [8]. To describe the tangent cone in more detail we work in RM. Then, when x lies in a top-dimensional, or locally top-dimensional, stratum σ of dimension m′(⩽m), the orthant space Xm is locally an m′-dimensional manifold so that a smooth curve can be extended on both sides of x. Thus, the tangent cone will be the usual tangent space, a subspace of RM isometric with Rm′ and tangent to σ. However, if x lies in a stratum of locally positive co-dimension, then the orthant space Xm is no longer locally a manifold. Consequently, the tangent cone at x is no longer a Euclidean space. For example, if the stratum σ has co-dimension one and bounds top-dimensional strata, the tangent cone to Xm at x is an open book: it has a closed half space Hm for each top-dimensional stratum τ co-bounding σ, with all the boundary (m−1)-dimensional faces identified with each other and with the tangent space to σ at x.
More generally, the tangent cone at a point x, in a stratum σ=O(E) of co-dimension l(⩾1), has a topology and stratification imitating that of Xm itself in the neighbourhood of x: for each stratum τ=O(E∪F) of co-dimension l′<l that co-bounds σ, so that F comprises the basis vectors that have positive coordinates in τ but zero coordinates in σ, there is the closed stratum \mathbb{R}(E)\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\overline{\mathop{\mathcal{O}}\nolimits(F)}} in the tangent cone. Then, the tangent cone at x has its stratification determined by identifying the various R(E)×{0} with each other as well as identifying any tangent axes shared by pairs of strata that co-bound σ. In particular, when no strata co-bound σ, the tangent cone is simply the Euclidean space R(E).
Definition 4**.**
Let σ=O(E) and τ=O(E∪F) be two strata in Xm with co-dimensions l and l′<l, respectively. The component R(E) common to all the strata in the tangent cone to Xm at x∈σ is referred to as the tangent space to σ at x. Vectors in the (open) stratum R(E)×O(F) of the tangent cone at x∈σ with non-zero second component are referred to as vectors tangent to τ at x.
The set of unit vectors in R(E)×O(F) is denoted by Sτ,σm−l′ and the subset of those in {0}×O(F) by Sτ∖σl−l′.
The sets Sτ,σm−l′ and Sτ∖σl−l′ are open spherical segments of dimensions m−l′−1 and l−l′−1 respectively, the latter lying in the space R(F) orthogonal to R(E).
Note that the basis vectors in E do not generally precede those of F in the standard ordered basis U, and so writing the stratum as R(E)×O(F) implies an appropriate permutation of the coordinates.
Definition 5**.**
For any subset E of the standard ordered orthonormal basis U of RM, where E does not necessarily inherit its order from U, we denote by :R(E)→RM the linear transformation permuting coordinates and positioning them appropriately as coordinates, with respect to U, of a vector in RM.
We are mainly interested in the restriction of to subspaces of R(E). For example, if E=(u1,u4) and F=(u2,u6), then a point (x,y) in R(E)×O(F) with coordinates ((x1,x2),(y1,y2)) would have (x,y)=(x1,y1,0,x2,0,y2,0,⋯,0) in RM.
Inherited from the CAT(0)-structure of Xm, the tangent cone to Xm at x, since it is metrically complete, also has a
CAT(0)-structure (cf. [6], Theorem 3.19). While the CAT(0)-metric on Xm is, by definition, the intrinsic metric, the CAT(0)-metric on the tangent cone to Xm at x is defined in terms of the Alexandrov angle. Recall that, for any three points x,x1,x2 in Xm, the comparison triangle of the geodesic triangle Δ(x,x1,x2) in Xm formed by x,x1,x2 is the triangle Δˉ(x,x1,x2) in the Euclidean plane with vertices xˉ, xˉ1, xˉ2 such that the Euclidean distances d(xˉ,xˉ1) etc. match the intrinsic distances d(x,x1) etc. in Xm. Then, the Alexandrov angle ∠x(γ1,γ2) between the geodesics γ1 and γ2 starting from x is defined to be
[TABLE]
where ∠x(γ1(t),γ2(t)) is the Euclidean angle at \bar{\mathop{\boldsymbol{x}}}\nolimits of the comparison Euclidean triangle Δˉ(x,γ1(t),γ2(t)) (cf. [6], Section 1.12). Note that, since geodesics in Xm are piecewise linear, the above limit is well-defined. Then, the inner product on the tangent cone of Xm at x is defined by
[TABLE]
where w1 and w2 are the initial tangent vectors of γ1 and γ2. By analogy with vectors in the tangent space to a manifold, the distance ρx(w1,w2) between vectors w1 and w2 in the tangent cone at x is defined to be
[TABLE]
(cf. [16], p144). Note that, although in general ≪,≫ differs from the usual Euclidean inner product ⟨,⟩, a geodesic triangle contained in the closure of a stratum of Xm is in fact a Euclidean geodesic triangle and its angles are the Euclidean ones. In particular, ≪w1,w2≫=⟨w1,w2⟩ for any w1,w2 in the closure of R(E)×O(F) and then ρx(w1,w2)=∥w1−w2∥.
3 The carriers and supports of geodesics
In order to analyse the logarithm map, we first need to understand the geodesics. The intersection of a geodesic with a stratum, a Euclidean orthant, will be either a single point or a complete intersection of a Euclidean line with that orthant.
Definition 6**.**
The carrier of a geodesic is the sequence of strata each of whose intersection with the geodesic is a Euclidean line of positive length.
This is essentially the terminology that was introduced in [22] in the context of tree spaces. The case of a single point intersection arises between successive strata of the carrier: between the (open) linear segment in one stratum and that in the next, there will be one point in the common bounding stratum of those two strata. This intermediate stratum is not listed in the carrier; it is in fact specified by the adjacent strata as the stratum of highest dimension in the intersection of their closures. Similarly, when a geodesic starts, or ends, in a stratum of positive co-dimension and does not remain in that stratum, but passes immediately to a co-bounding stratum, then the latter will be the first, or last, stratum in the carrier. In such a situation, we shall regard the point in the bounding stratum as having the same set of axes as the co-bounding stratum, albeit with the relevant coordinates zero. That is, we regard it as a point of the closure of the co-bounding stratum.
To describe the carrier of the geodesic from x1 to x2 in more detail, as well as for later analysis, we require the following terminology.
Definition 7**.**
(i)* The subsets E and F of U are said to be compatible in the orthant space Xm if the orthant O(E∪F) is contained in Xm.*
(ii)* For a subset E of the standard orthonormal basis U of RM, we denote the number of vectors in E by ∣E∣.*
We first identify the set of axes common to the points along a geodesic where, for any x∈Xm, E(x) denotes the set of axes in U with respect to which x has positive coordinates.
Proposition 1**.**
For any x1,x2∈Xm, the set E(x1,x2), defined by
[TABLE]
forms the set of axes common to all strata along the geodesic between x1 and x2.
Proof.
Observe that, for a geodesic, each coordinate function must be linearly interpolated between any two values that are non-zero. It follows that once a particular coordinate, having been positive along the geodesic, becomes zero it must remain so or, having started at zero, once it becomes positive, it must continue monotonically to its final value. In particular, the only basis vectors that can occur with positive coordinate at any point along the geodesic from x1 to x2 are those that belong to x1 or x2 or to both. Moreover, when the geodesic from x1 passes immediately to a co-bounding stratum, all the new axes in that stratum must have coordinate zero at x1 increasing linearly along the geodesic to its value at x2. Any such additional axis e of the co-bounding stratum is in E(x2) and is compatible with all the axes in {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}E(\mathop{\boldsymbol{x}}\nolimits_{1})}; and any such e must occur in this way. Thus, the set of axes common to all strata along the geodesic from x1 to x2 is precisely the given set E(x1,x2).
∎
Note that, at one extreme, if x1 and x2 both lie in the closure of an orthant O(E) and not both in the same boundary component, then E(x1,x2)=E. At the other extreme, if O(E(x1))∩O(E(x2))=∅, then E(x1,x2)=∅. In general, E(x1,x2) depends only on the orthants in which x1 and x2 lie, and is independent of their positions in those orthants.
The number k+1 of orthants in the carrier C=(O0,O1,⋯,Ok) of the geodesic from x1 to x2 will, naturally, depend on both x1 and x2. If x1 lies in a top dimensional stratum it will have m strictly positive coordinates, all of which, assuming that none are also positive in x2, must become zero somewhere along the geodesic and at least one must become zero on each change of stratum as they cannot vanish within a stratum of the carrier. Thus, there will be m+1 strata in the carrier, that is k=m, if and only if they vanish one at a time. So, k<m if and only if somewhere along the geodesic at least two coordinates become zero on passing from Oi to Oi+1. When ∣E(x1,x2)∣=k0, the maximum value of k would now be k′=m−k0. Similarly, if x1 were in a stratum of dimension m0, this maximum would be m0−k0.
From now on, for given x1 and x2, we shall denote the set E(x1,x2) by both A0 and B0 to accord with the following notation. It follows from Proposition 1 that each member of the sequence of strata C=(O0,O1,⋯,Ok) that comprise the carrier of the geodesic γ from x1 to x2 has O(A0)=O(B0) as a factor. The carrier of γ determines further subsets of axes forming two sequences (A1,⋯,Ak) and (B1,⋯,Bk), where Ai is the set of all the axes whose coordinates become zero and Bi the set of all those whose coordinates become positive as the geodesic passes from Oi−1 to Oi. Thus, the stratum Oi−1 is O(B0∪B1∪⋯∪Bi−1∪Ai∪⋯∪Ak) and Oi=O(B0∪B1∪⋯∪Bi∪Ai+1∪⋯∪Ak), with O0 determined by A0∪A1∪⋯∪Ak. Clearly, the intermediate stratum between Oi−1 and Oi, their common boundary component, is
[TABLE]
Thus, in particular,
(a)
the sets Bi and Aj of axes are non-empty for all positive i and j, and compatible in Xm for {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}0\leqslant}i<j;
2. (b)
γpasses successively with positive length through the orthants Oi except that it may meet at most one of O0 and Ok in a single point;
3. (c)
Ai∩Aj=∅* and Bi∩Bj=∅ for all* i=j.
The property (c) follows from the facts that A1∪⋯∪Ak is disjoint from B1∪⋯∪Bk and that an axis once removed cannot be removed again, or once introduced cannot be introduced again.
Definition 8**.**
For any two points x1 and x2 in Xm, the support of the geodesic γ from x1 to x2 is defined to be the pair (A,B) of sequences of sets of axes,
[TABLE]
where γ passes successively through the orthants
[TABLE]
that form the carrier of γ.
In the context of tree spaces, the definition of the support of a geodesic given here is equivalent to that of the minimal support given in [15].
Example 1**.**
For a geodesic passing successively through the orthants
[TABLE]
the relevant sequences A=(A0,A1,⋯,A4) and B=(B0,B1,⋯,B4) forming the support would have members A0=B0={e0,e1}, the basis vectors common to all five orthants; A1={e2}, B1={f2}; A2={e3,e4}, B2={f3}; A3={e5}, B3={f4}; A4={e6} and B4={f5,f6}.
If both x1 and x2 lie in the closure of the same orthant, then the geodesic between them is clearly the Euclidean line segment. To understand geodesics in general and, later, to describe and analyse various properties of the logarithm map, we require the orthogonal projections onto the various strata of Xm, where the orthogonality is with respect to the Euclidean inner product on RM.
Definition 9**.**
For x∈Xm and E⊂U such that the orthant σ=O(E) is contained in Xm, PE(x) denotes the orthogonal projection of x onto O(E), that is the vector, or when relevant its coordinate vector, formed by the components of x in the directions of the unit vectors in E.
In terms of projections, we have the following characterisation of the supports of geodesics when x1 and x2 do not lie in the closure of the same orthant.
Proposition 2**.**
Let x1 and x2 be two given points in Xm. Suppose that A=(A0,A1,⋯,Ak) and B=(B0,B1,⋯,Bk) are two sequences of sets of axes such that the Oi defined by (7) are all contained in Xm, where k>0 and where all subsets Ai and Bj are mutually disjoint and non-empty, except for A0=B0=E(x1,x2) which may be empty. Then, (A,B) is the support of the geodesic from x1 to x2 if and only if
(i)
for k>1 and for all 0<i<k,**
[TABLE]
(ii)
for all 0<i⩽k and all non-trivial partitions Ci1∪Ci2 for Ai and Di1∪Di2 for Bi, if the orthant
[TABLE]
is contained in Xm, then
[TABLE]
Compared with the result in [20] (Theorem 2.5) in the case of tree spaces, this result confirms the claim in Section 6 of [15] that the results on tree spaces also hold for orthant spaces. However, the condition (ii) above is necessarily stronger than that there. This is due to the fact that, in general orthant spaces, the condition that Ci2 is compatible with Di1 does not necessarily guarantee that the orthant O′ given by (9) is contained in Xm.
Proof.
Assuming that (A,B) is the support of the geodesic from x1 to x2, we focus on three consecutive strata of the carrier, Oi−1,Oi,Oi+1, where Oj are as defined in (7), projecting the geodesic onto the subspace R(Ai∪Ai+1∪Bi∪Bi+1). As the geodesic passes from Oi−1 to Oi, the coordinates along the axes in Ai become zero and those in Bi start to grow. Then, on passing from Oi to Oi+1, the coordinates of axes in Ai+1 become zero and those in Bi+1 grow. Consider the projection of the geodesic onto the three planar quadrants Πi−1 determined by the vectors PAi(x1) and PAi+1(x1), Πi determined by PAi+1(x1) and PBi(x2) and Πi+1 determined by PBi(x2) and PBi+1(x2) as in Figure 1.
This is an isometric representation of the relevant quadrants except that, in RM, all four vectors are mutually orthogonal. Then, Oi is in the carrier if and only if the projection of the geodesic passes through the interior of Πi. That is if, and only if, the angle θ that the vector p(x1)=PAi∪Ai+1(x1) makes with PAi(x1) in Πi−1 is greater than the angle ϕ that p(x2)=PBi∪Bi+1(x2) makes with
PBi(x2) in Πi+1, as expressed by (8).
Similarly, if O′ is contained in Xm, the failure of (10) would ensure that the geodesic passed through O′, with positive length, between Oi−1 and Oi.
To show that conditions (i) and (ii) determine the support of the geodesic from x1 to x2, we first note that, as seen above, (i) ensures that the geodesic must pass through the orthant Oi between Oi−1 and Oi+1. Since Xm is a cone, it is simply connected and any piecewise linear path from x1 to x2 can be transformed by homotopy to a geodesic by a sequence of ‘simple moves’ whereby, for each move, two consecutive linear segments of the path are replaced by a single linear segment. Since the geodesic is linear within orthants, that can only occur between consecutive orthants and
condition (ii) guarantees that there is no extra orthant in the carrier between Oi−1 and Oi.
∎
As noted previously, if x1 and x2 both lie in the closure of an orthant, then k=0 and the geodesic between x1 and x2 is always a Euclidean segment. Then, when x2 varies within the orthant in which it lies, the support of the geodesic from x1 to x2 remains the same. However, in general, the support may change. The above characterisation of the support of a geodesic implies the following sufficient condition for the support to remain locally constant.
Corollary 1**.**
Suppose that the hypotheses of Proposition 2 hold. If, for all 0<i⩽k and for all relevant partitions of Ai and Bi as in (ii) of that proposition, the inequality (10) is strict then, for all x in a sufficiently small neighbourhood of x2 in its stratum, (A,B) remains the support for the geodesic from x1 to x.
Proof.
Since x varies within the stratum in which x2 lies, the set A0=B0 remains unchanged. For the other sets in the support, by continuity, the strict inequalities (8) and, we are assuming, (10) continue to hold for x in a sufficiently small neighbourhood of x2 within its stratum. Hence, the required result follows from Proposition 2.
∎
4 The logarithm map
Analogous to an inverse of the exponential map on a Riemannian manifold, the logarithm map on Xm is defined as follows.
Definition 10**.**
The logarithm map at x∗∈Xm is the map logx∗(x) from Xm to the tangent cone to Xm at x∗, the image of x being the initial tangent vector, with norm d(x∗,x), to the geodesic from x∗ to x.
The logarithm map is globally well-defined since, as already mentioned, the Cartan-Hadamard theorem implies that there is a unique geodesic between any two points x∗ and x of Xm. If that geodesic has an initial segment in a stratum containing x∗ it will certainly have an initial tangent vector. If it has only x∗ in the initial stratum, it must then have an open segment γ(0,ϵ), with γ(0)=x∗, in a co-bounding stratum. Then it will still have a one-sided derivative at x∗ which suffices to define the logarithm map.
With the description of the carrier, as well as the results on the support, of a geodesic in the previous section, we are now in a position to derive and analyse its initial tangent vector, or equivalently logx∗(x). As in [2] and [3] for the space of trees, our analysis will mainly involve a modified version of the logarithm map. For this, since the tangent cones at various points in σ are all parallel, we may parallel translate them to the cone point o, the origin in RM, to produce a common isometric copy Cσ. Then, since the coordinate vector of the point x∗, which we also denote by x∗, lies in the common factor R(E) of all the strata of Cσ, it makes sense to add it to logx∗(x) and the result
[TABLE]
will also lie in Cσ. We shall refer to Φ as the translated logarithm map to distinguish it from the logarithm map itself. All the vectors Φ(x;x∗) being in the same space implies that the translated logarithm maps are directly comparable as x∗ varies within an orthant and such comparability will be necessary later. Moreover, the difference between the two maps is such that all our analysis of Φ can easily be translated to that of the logarithm map itself.
Note that, although the origin corresponds to the cone point o of the orthant space Xm, Cσ is not the tangent cone to Xm at o, neither being contained in the other, unless σ={o}. Note also that, when Xm is a tree space and x∗ lies in a top-dimensional stratum, Φ(x;x∗) was called the modified logarithm map and was denoted by Φx∗(x) in [3], and the permutation map π there corresponds to the linear transformation given by Definition 5.
The next theorem gives the expression for the translated logarithm map Φ(⋅;x∗) in terms of the projections, specified in Definition 9, onto various sets of axes appearing in the support of the geodesic from x∗ to x.
Theorem 1**.**
For any two points x∗ and x in Xm, let the sequences A=(A0,⋯,Ak) and B=(B0,⋯,Bk) of sets of axes form the support of the geodesic from x∗ to x. Then, the translated logarithm map Φ(⋅;x∗) at x∗ is given by
[TABLE]
where is the linear transformation given by Definition 5.
In particular, Φ(⋅;λx∗)=Φ(⋅;x∗) for any constant λ>0.
Recall that, if k=0, then x and x∗ lie the closure of an orthant and the geodesic from x∗ to x is a line segment in RM. In this case, Φ(x;x∗)=(x). If k>0 and if ∣Ai∣=∣Bi∣=1 for 1⩽i⩽k, the form of the expression for Φ(x;x∗) is also similar to that of the corresponding translated logarithm map in a Euclidean space, after changing the axes Bi to −Ai.
Proof.
The orthogonal projection of γ onto O(A0) determines the component of the initial tangent vector to γ that is tangent to O(A0), namely
[TABLE]
For the remaining coordinates, since the sets Ai and Bj above are all mutually disjoint, it follows that, for each i, the subspace R(Ai∪Bi) is orthogonal to all R(Aj) and R(Bj) for j=i, so that the coordinates of the geodesic γ that are positive with respect to the axes in R(Ai∪Bi) are just those of the projection γi of γ onto that subspace. If si is the parameter such that γ(si)∈Oi−1∩Oi, then {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}P_{A_{i}}(\gamma(s))}\in\mathop{\mathcal{O}}\nolimits(A_{i}) declines linearly from {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}P_{A_{i}}(\gamma(0))=P_{A_{i}}(\mathop{\boldsymbol{x}}\nolimits^{*})} to {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}P_{A_{i}}(\gamma(s_{i}))}=\bf{0}. Then, the coordinates {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}P_{B_{i}}(\gamma(s))}\in\mathop{\mathcal{O}}\nolimits(B_{i}) increase linearly from zero at γ(si) to {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}P_{B_{i}}(\gamma(1))=P_{B_{i}}(\mathop{\boldsymbol{x}}\nolimits)}. Thus, the projected geodesic γi lies in the union of the orthogonal orthants O(Ai) and O(Bi) and hence has length {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\|P_{A_{i}}(\mathop{\boldsymbol{x}}\nolimits^{*})\|+\|P_{B_{i}}(\mathop{\boldsymbol{x}}\nolimits)\|}. The initial tangent vector to γi is parallel to −PAi(x) and so is
[TABLE]
Hence, the initial tangent vector to γ with norm d(x∗,x) is represented by (v0,v1,⋯,vk). However, this ordering of the coordinates, with those in R(Ai) preceding those of R(Ai+1) for each i, requires the linear transformation to obtain its representation with respect to the standard basis in RM. Then, the logarithm map at x∗ will be
[TABLE]
so that equation (11) follows from the coordinates vi since the coordinates of x∗ are \jmath\left({\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}P_{A_{0}}(\mathop{\boldsymbol{x}}\nolimits^{*}),P_{A_{1}}(\mathop{\boldsymbol{x}}\nolimits^{*}),\cdots,P_{A_{k}}(\mathop{\boldsymbol{x}}\nolimits^{*})}\right).
∎
In the following, when we say that the expression (11) for Φ(x2;x∗) takes the same form as the corresponding expression for Φ(x1;x∗), we mean that the expression for Φ(x2;x∗) can be obtained by replacing x1 by x2 in the expression (11) for Φ(x1;x∗). Clearly, the form of the expression for Φ(x;x∗) will depend on the support (A,B) of the geodesic from x∗ to x, noting that the roles that A and B play are not symmetric. The following example illustrates this feature where, although x lies in the same orthant in the second and third cases, the forms for Φ(x;x∗), as a function of x, differ in the two cases. However, along the boundary between the light and dark grey regions, the two forms give the same result.
Example 2**.**
Consider X2 in R5, which was called Q5 in [2], consisting of five orthants as shown in Figure 2, where all five axes are mutually orthogonal.
The tangent cone to X2 at x∗=(x1∗,x2∗,0,0,0) indicated in Figure 2 is the (u1,u2)-plane and that at the cone point o is X2 itself. While Φ(x;o)=x for all x, the expression for Φ(x;x∗) takes different forms depending the position of x. For example, for any x=(0,x2,x3,0,0) in the orthant O(u2,u3),
[TABLE]
for x=(0,0,x3,x4,0) in the dark grey region of O(u3,u4), i.e. if the coordinates of x satisfy x4/x3<tan(α)=x2∗/x1∗, then
[TABLE]
However, if x=(0,0,x3,x4,0) lies in the light grey region of O(u3,u4), i.e. if the coordinates of x satisfy x4/x3>tan(α)=x2∗/x1∗, then
[TABLE]
In particular, for all x in the light grey region of X2, the vectors Φ(x;x∗) have the same direction −∥x∗∥1(x1∗,x2∗,0,0,0) and the only difference between them lies in the length of this vector.
The potential variation of the form of the expression (11) for the translated logarithm map, arising from the changes in the supports of the geodesics, is one of the main obstructions to generalising the theory for manifolds to orthant spaces, or more general stratified spaces. To study this variation, we first note the following result, which is a direct consequence of Corollary 1.
Corollary 2**.**
If the support of the geodesic from x∗ to x satisfies the conditions of Corollary 1, then there is a neighbourhood N of x within its stratum such that, for any x′∈N, the form of the expression (11) for Φ(x′;x∗) takes the same form of that for Φ(x;x∗).
We now characterise, in terms of the two conditions on the support of a geodesic given in Proposition 2, changes in the form of the expression (11) for Φ(x;x∗) when x varies locally. Although the roles played by these two conditions in determining the support of a geodesic are different, to some extent, they play a similar role in the change of the form of that expression. Replacing the inequality (8) or (10) by equality determines a quadratic co-dimension one hyper-surface. When two or more such hyper-surfaces meet, their normals are linearly independent so they intersect in surfaces of co-dimension at least two. Thus, it suffices to consider a point lying in a single such hyper-surface. Then, points on either side of the hyper-surface will have different supports for their geodesics from x∗, but that will not always result in a change in the form the expression for Φ(x;x∗).
Proposition 3**.**
Let x∗ and x0 be two given points in Xm, and let (A,B) be the support of the geodesic from x∗ to x0, where A=(A0,A1,⋯,Ak) and B=(B0,B1,⋯,Bk), and where k>1. Assume that x moves from x0, within its stratum, to a first point x1 such that, for i=i0>0, the inequality (8), with x1,x2 replaced by x∗,x1 respectively, becomes an equality while all the other inequalities (8) and (10) remain strict. Then, the support (A′,B′) of the geodesic from x∗ to x1 has
[TABLE]
and similarly for B′.
If the orthant
[TABLE]
is contained in Xm, then there is a neighbourhood N of x1 within its stratum such that, for all x∈N, the form of the expression (11) for Φ(x;x∗) is identical with that for Φ(x0;x∗).
If O′′ is not an orthant of Xm then, in any neighbourhood N of x1 within its stratum, there are x′ and x′′ such that the form for Φ(x′;x∗) is the same as that for Φ(x0;x∗) and that for Φ(x′′;x∗) is determined by the support (A′,B′). When N is sufficiently small, there are no other possibilities.
Proof.
By Corollary 2, the form of the expression (11) will remain constant, as long as the inequalities (8) and (10) remain strict. However, for x=x1, on account of the equality (8) for i0 at x=x1, the angles θ and ϕ, in the projected diagram of Figure 3, will be equal where the projections are as specified in the proof of Proposition 2.
Consequently at x1, Oi0 will drop out of the carrier, where Oi is defined by (7), and, by the continuity of geodesics, the support of the geodesic from x∗ to x1 will be (A′,B′).
Now, let x continue to move past x1 to x2, remaining sufficiently close to x1 and having projection p(x2)=PBi0∪Bi0+1(x2) in Figure 3 lying on the opposite side to p(x0)=PBi0∪Bi0+1(x0) of the ray from the origin to p(x1)=PBi0∪Bi0+1(x1). If O′′ is contained in Xm, the projection of the geodesic from x∗ to x2 would be, as in Figure 3(a), the ‘straight’ line from p(x∗)=PAi0∪Ai0+1(x∗) to p(x2) passing through the planar quadrant Π0 determined by PAi0(x∗) and PBi0(x0). This would imply replacing Oi0 in the carrier by O′′ with the resulting support for the geodesic from x∗ to x2 being (A′′,B′′), where
[TABLE]
and similarly for B′′. In this case, the application of the linear transformation in the expression for Φ(⋅;x∗) implies that, for such x2, the form of the expression (11) for Φ(x2;x∗) is identical with that for Φ(x0;x∗).
Assume now that O′′ is not an orthant of Xm. There might still be an intermediate orthant between Oi0−1 and Oi0+1 arrived at by non-trivial partitions Ai0=C1∪C2, Ai0+1=D1∪D2, Bi0=E1∪E2 and Bi0+1=F1∪F2 such that the orthant
[TABLE]
is contained in Xm and provides a shorter path between Oi0−1 and Oi0+1. In which case, by Proposition 2(i), we must have
[TABLE]
This would result in
[TABLE]
and, taking the limit as x2→x1,
[TABLE]
On the other hand, the closures of the orthants Oi0−1 and O being in Xm ensure that all 2-dimensional orthants in the closure of
[TABLE]
are in Xm and hence, by Definition 1, so too is O∗ itself. Then, by the assumption of uniqueness of the equality at x1 of the proposition, we must have by Proposition 2(ii) that
[TABLE]
Similarly, by considering the orthant O(B0∪⋯∪Bi0∪F1∪D2∪Ai0+2∪⋯∪Ak), we get
[TABLE]
Since, by assumption, Oi0 drops out of the carrier at x1, we also have
Thus, if O′′ is not contained in Xm, the projection of the geodesic from x∗ to x2 continues to pass through the origin, as shown in Figure 3(b), and the carrier remains as it was for x1, where the support is (A′,B′) given above. In this case, the form of the expression (11) for Φ(x2;x∗) clearly differs from that for Φ(x0;x∗).
∎
Note that the equality (8) for i=i0 at x1 and the mutual orthogonality of all the axes together imply that
[TABLE]
This confirms that the form of the expression for Φ(x0;x∗) is still valid for Φ(x1;x∗), as expected by the continuity of geodesics. Similarly, the form of the expression for Φ(x2;x∗) is still valid for Φ(x1;x∗) whether or not the orthant Oi0 has been replaced by O′′.
A similar argument to that for the proof of Proposition 3 gives the following complementary result.
Proposition 4**.**
Let x∗ and x1 be two given points in Xm, and let (A,B) be the support of the geodesic from x∗ to x1, where A=(A0,A1,⋯,Ak) and B=(B0,B1,⋯,Bk), and where k>0. Assume that all inequalities (8) and (10), with x1,x2 replaced by x∗,x1 respectively, are strict except that, for i=i0>0 and unique non-trivial partitions Ci01∪Ci02 for Ai0 and Di01∪Di02 for Bi0, (10) is an equality and that the corresponding orthant O′ given by (9) with i=i0 is contained in Xm.
If the orthant
[TABLE]
is contained in Xm, there is a neighbourhood N of x1 within its stratum such that the form of the expression for Φ(x;x∗) is the same, for all x∈N. Then, the common form of the expression for Φ(x;x∗) is determined by (A′,B′), where
[TABLE]
and similarly for B′.
If O′′′ is not an orthant of Xm then, in any neighbourhood N of x1 within its stratum, there are x′ and x′′ such that the form for Φ(x′;x∗) is the same as that for Φ(x1;x∗) and that for Φ(x′′;x∗) is determined by (A′,B′). When N is sufficiently small, there are no other possibilities.
The carrier of the geodesic from x∗ to x will also change when x moves from one stratum to another which necessarily involves, as initial, final or intermediate stratum, a stratum of locally positive co-dimension. The set of all such strata, together with the quadratic hyper-surfaces determined by equalities in each of the relevant equations (8), form the defining boundaries for the (pre)-vistal polyhedral subdivision, with respect to x∗, in [15]. The points in any component of the complement of these surfaces all have the same carrier. However, for our analysis, we shall only be concerned with changes in the forms of the expressions taken by logx∗(x), or equivalently by Φ(x;x∗), when x or x∗ vary within their strata rather than changes in the underlying carrier. For this, we note that the results in Propositions 3 and 4 where the changed support must be used to obtain the correct expression for Φ(x;x∗) are reflections of each other where an othant is removed or introduced, respectively, in the carrier. Thus, we may encapsulate as follows the hyper-surfaces across which, though not at which, it is necessary to take account of the change of support to obtain the correct value for the logarithm map.
Definition 11**.**
Given a point x∗∈Xm, Dx∗ denotes the set that consists of all points x∈Xm for which the support (A,B), where A=(A0,⋯,Ak) and B=(B0,⋯,Bk), of the geodesic from x∗ to x has the property that, for one or more i=i0>0, there are non-trivial partitions Ai0=Ci01∪Ci02 and Bi0=Di01∪Di02 with
[TABLE]
where the corresponding orthant O′ of (9) is contained in Xm, but O′′′ of (16) is not.
In view of the symmetry that reverses the geodesics at the same time as it reverses the order of the strata and interchanges the roles of the sequences A and B of edge sets in the support, the definition is symmetric: x∈Dx∗ if and only if x∗∈Dx. Since each stratum is a Euclidean orthant, it is preserved under multiplication by λ>0 in RM which also multiplies the length of each curve by λ. Then, since the geodesic γ joining x∗ to x is the shortest curve through the strata of Xm from x∗ to x, it follows that γ is mapped onto the geodesic from λx∗ to λx. In particular, these two geodesics have the same carrier. Thus, Dλx∗=λDx∗ and, since the equations (17) are homogeneous, Dx∗=λDx∗.
The pseudo-partition of Xm with respect to x∗ determined by Dx∗ gives rise to a polyhedral subdivision of each stratum by restriction. It is coarser than the (pre)-vistal subdivision of [15] and, if Xm is a tree space and if x∗ lies in a top-dimensional stratum, it is equivalent to the polyhedral subdivision defined in [3].
5 Limits, projections and derivatives
We now turn to certain limits and projections of the translated logarithm map that, in particular, will enable us to calculate the directional derivatives we require.
Firstly, we obtain an expression for the limit of the translated logarithm map as the reference point x∗ moves along a geodesic. For a vector w in the tangent cone to Xm at x∗, write x∗(λ,w) for the point distant λ∥w∥ along the geodesic γ starting at x∗ with initial tangent vector w. Then, we have the following result.
Theorem 2**.**
Let σ=O(E) be a stratum of Xm, x∗∈σ and x be a fixed choice of point anywhere in Xm.
(i)
If w∈R(E) is tangent to σ at x∗, then
[TABLE]
2. (ii)
If σ bounds τ=O(E∪F) in Xm and wτ∈R(E)×O(F) is tangent to τ at x∗, then the limit
[TABLE]
exists. Moreover, there exist ϵ>0 and sequences A=(A0,A1,⋯,Ak) and B=(B0,B1,⋯,Bk) of sets of axes such that, for each λ∈(0,ϵ), (A,B) forms the support of the geodesic from x∗(λ,wτ) to x. In terms of these A and B,
[TABLE]
where Wi=PAi∩E(x∗), unless PAi∩E(x∗)=0, in which case Wi=PAi∩F(wτ),* the projection of wτ on R(Ai), and is the linear transformation given by Definition 5.*
For x∗∈σ⊆Xm, Ψ(x,w;x∗) defined by (18) is the limit of the translated logarithm map of Φ(x;x′) as x′→x∗ from the direction w. When the direction w is clear in the context we shall, in the following, call Ψ(x,w;x∗) simply the directional limit of Φ(x;x′).
Proof.
(i) This follows from the uniform continuity of geodesics with respect to their end points (cf. [6], pp195-196) and also from a minor modification of the proof of (ii) below.
(ii) Note that, since wτ∈R(E)×O(F), x∗ and x∗(λ,wτ) lie in different strata. Writing γλ for the geodesic from x∗(λ,wτ) to x, as x∗(λ,wτ) moves along γ the support of γλ can only change when γ meets transversally one or more of the hyper-surfaces where the carrier of the geodesic to x changes. This can only happen at discrete points along γ so, for some ϵ>0 and 0<λ⩽ϵ, the carriers of the geodesics γλ will be independent of λ. Let (A,B) be the support of γϵ from x∗(ϵ,wτ) to x, where A=(A0,A1,⋯,Ak) and B=(B0,B1,⋯,Bk). Then, A0∪A1∪⋯∪Ak=E∪F and, for 0<λ⩽ϵ, the integer k and the support (A,B) will remain constant for the expression
[TABLE]
replacing x∗ in (11) by x∗(λ,wτ). Then, since the x∗(λ,wτ) lie in τ for all sufficiently small positive λ, the vectors Φ(x;x∗(λ,wτ)) all lie in Cτ so that it makes sense to take the limit as λ→0+, where Cτ is the common translated cone of the tangent cone at x∗(λ,wτ) as introduced in Section 2.
To evaluate it, we take the limit in the above expression for Φ(x;x∗(λ,wτ)). Since x∗∈O(E), x∗(λ,wτ)=x∗+λwτ for sufficiently small λ>0 and it follows that PAi(x∗(λ,wτ))=PAi∩E(x∗)+λPAi(wτ). So the limit as λ→0+ of this term is PAi∩E(x∗) if that is non-zero. If it is zero, then Ai∩E=∅ since ∥P{e}(x∗)∥>0 for all e∈E. Then PAi(wτ), the projection of wτ on R(Ai) is, in fact, PAi∩F(wτ).
∎
If σ has co-dimension l and τ co-dimension l′ then, when l−l′=1 and so ∣F∣=1, there is no i>0 such that ∣Ai∣>1 and PAi∩E(x∗)=0 as all the axes involved in the carrier that are not in E∪F are in A0=B0. If further l=1 and l′=0, that is, σ is a stratum of local co-dimension one and τ co-bounding σ is a locally top-dimensional stratum, then Ψ(⋅,wτ;x∗) obtained here is identical with the map resulting from the ‘folding map’ composed with Φ(⋅;x∗) used in [3] when Xm is a tree space, noting that wτ in this case is unique up to a positive scalar multiple.
Example 3**.**
Consider the orthant space X2 in Example 2. Take σ={o} and τ=O(u1,u2). Recall from Example 2 that the tangent cone to X2 at o is X2 itself. Take wτ=x∗, where x∗∈τ is indicated in Figure 2. Then, Ψ(x,wτ;o)=Φ(x;x∗) for any x∈X2. Since the light grey region in Figure 2 may change if x∗ changes, Φ(⋅;x∗) may change as a map when x∗ changes. Hence, the directional limit Ψ(⋅,wτ;o) of Φ(⋅;λwτ) from the direction wτ as λ→0, as a map, also depends on wτ.
For wτ as given in Theorem 2(ii), write wτ⊥ for the component of wτ orthogonal to σ, that is, the component in {0}×O(F)⊂R(E)×O(F). Then, the following consequences of Theorem 2 imply that, although the directional limit Ψ(x,wτ;x∗) generally depends on wτ, for given x and x∗, as noted in Example 3 above, it remains constant in some circumstances. In particular, to consider the changes of Ψ(x,wτ;x∗) as x varies, it suffices to restrict attention to wτ∈Sτ∖σl−l′, recalling that Sτ∖σl−l′ is the open unit spherical segment of {0}×O(F) given by Definition 4.
Corollary 3**.**
With the notation and hypotheses of Theorem \refthm2(ii),
(i)
Ψ(⋅,λwτ;x∗)=Ψ(⋅,wτ;x∗)* for all λ>0;*
2. (ii)
Ψ(x,wτ;x∗)=Ψ(x,wτ⊥;x∗).
Proof.
(i) is obvious from the expression (19) and (ii) is immediate since σ=O(E), τ=O(E∪F) and only the F-coordinates of wτ are potentially involved in (19).
∎
When x∗ lies in a stratum σ of positive co-dimension that is not locally top-dimensional, the vector logx∗(x), and so Φ(x;x∗), will usually have non-zero components both tangent to σ and orthogonal to it. In order to discuss the projections, onto these components, of the translated logarithm map and of its directional limits, as well as to discuss their derivatives, we extend the notation P for projection maps on Xm given by Definition 9 to include projection maps on tangent cones, or their translated cones. However, since we are more interested in the orthant itself rather than the axes determining it, we shall use Pσ instead of PE, where σ=O(E). In particular, for any stratum τ=O(E∪F) co-bounding σ in Xm, Pσ and Pτ∖σ respectively are the projections onto the two factors of the corresponding stratum R(E)×O(F) in the common translated cone Cσ, or equivalently in the tangent cone at a point of σ, depending on the context.
For x∗ in σ=O(E) or in τ=O(E∪F) co-bounding σ, we shall denote Pσ(logx∗(x)) by logx∗σ(x) and Pσ(Φ(x;x∗)) by Φσ(x;x∗). Note that, on Cσ, Pσ so defined is the tangential projection onto σ and Pτ∖σ is one of several possible normal projections. In particular, for wτ∈R(E)×O(F), wτ⊥=Pτ∖σ(wτ). We shall further extend the notation Pσ to include top-dimensional, or locally top-dimensional, strata by taking it to be the identity in that case, so that in particular Φσ(x;x∗)=Φ(x;x∗) if σ is a top-dimensional, or locally top-dimensional, stratum.
For x∗ in σ of locally positive co-dimension, the non-zero components of logx∗(x) orthogonal to σ correspond to axes with respect to which x∗ has zero coefficient and x has non-zero coefficient. Hence, these axes are in A0=B0=E(x∗,x), the set of axes common to all strata in the carrier of the geodesic between these two points, so that they correspond to components of v0 in (12). This implies, in particular, that Φσ(x;x∗) is given by (11) with PB0(x) there replaced by PB0∩E(x). Then, since the restriction to each stratum of the set Dx given by Definition 11 is relatively closed, the form of the expression for Φ(x;x′) will remain constant for x′ varying in a neighbourhood of x∗ in σ when x∗ is restricted to avoid Dx. Hence, the proof of Lemma 4 in [3] of the differentiability of Φ(x;x∗) with respect to x∗ for the case that Xm is a tree space and x∗ lies in a top-dimensional stratum will give the following generalisation of that result to the derivative of Φσ(x;x∗) with respect to x∗. Since the proof is similar to that for Lemma 4 in [3], we omit it here.
Proposition 5**.**
Let x and x∗ be fixed points in Xm with x∗ in the stratum σ=O(E) and x∈Dx∗, where the set Dx∗ is given by Definition 11. Then, the map
[TABLE]
is differentiable with respect to x′ at x∗ with derivative given by
[TABLE]
where the sequences A=(A0,⋯,Ak) and B=(B0,⋯,Bk) form the support of the geodesic from x∗ to x and J is the matrix representation of the linear transformation given by Definition 5, and where, for y=(y1,⋯,ylA)=0,
[TABLE]
is the derivative of the map y↦∥y∥1y.
Note that, if l>1, ∥y∥My† is the projection onto the hyper-plane in Rl orthogonal to y and, when l=1, My1†=0. Hence, if k=0 or if k>0 and ∣Ai∣=1 for all 1⩽i⩽k, then the derivative of Φσ(x;x′), with respect to x′, at x′=x∗ is zero. Recall that the corresponding translated logarithm map in the Euclidean space is the identity map, independent of x′, and so its derivative with respect to x′ is identically zero. Hence, in a broad sense, Proposition 5 captures where and how the derivative of Φσ(x;x′) differs from that of the corresponding translated Euclidean logarithm map.
Returning to the directional limit Ψ(x,wτ;x∗) of Φ(x;x′) with x∗∈σ=O(E), where τ=O(E∪F) co-bounds σ and wτ is in R(E)×O(F), since Ψ(x,wτ;x∗) is in Cτ, both projections Ψτ(x,wτ;x∗)=Pτ(Ψ(x,wτ;x∗)) and Pσ(Ψ(x,wτ;x∗)) are well defined. In particular, Ψτ(⋅,wτ;x∗) is a map from Xm onto R(E∪F). Then, we also have the following consequences of Theorem 2, giving the relationships between the projections of the directional limit of the translated logarithm map and the directional limit of the projections of the translated logarithm map.
Corollary 4**.**
With the notation and hypotheses of Theorem \refthm2(ii),
(i)
λ→0+limΦτ(x;x∗(λ,wτ))=λ→0+limΦτ(x;x∗(λ,wτ⊥))=Ψτ(x,wτ;x∗);
2. (ii)
Pσ(Ψτ(x,wτ;x∗))=Φσ(x;x∗).
Proof.
The equality of the extreme terms in (i) follows since the Wi in (19) are determined by the axes in E∪F, so that it does not matter whether we project on O(E∪F) before or after taking the limit, and the remaining term PB0∩(E∪F)(x) remains constant throughout the limiting process. The equality with the central term in (i) follows from Corollary 3(ii): Ψτ(x,wτ;x∗)=Ψτ(x,wτ⊥;x∗), which is λ→0+limΦτ(x;x∗(λ,wτ⊥)) by the case already established.
Note that, since projection onto R(E)⊂R(E∪F) is unaffected by first projecting onto R(E∪F), (ii) is equivalent to
[TABLE]
To establish (24), we need to allow for the fact that the geodesics γλ from x∗(λ,wτ) to x and the geodesic γ0 from x∗ to x may have different carriers. We assume that λ is restricted to the range 0<λ<ϵ such that the initial segments of γλ all lie in ζ=O(E∪F∪G), where possibly G=∅ and so ζ=τ, and let K be the set of axes with respect to which the initial segment of γ0 has positive coordinates. Then, K⊇E∪G. Now, e∈E∪F∪G if, and only if, for each λ and some maximal δ(λ)>0, ∥P{e}(γλ(s))∥>0 for s∈(0,δ(λ)). From the uniform continuity of geodesics with respect to their endpoints, it is clear that we must have δ(λ)→δ0⩾0 as λ→0. If δ0>0, then ∥P{e}(γ0(s))∥>0 for s∈(0,δ0) and so e∈K. Conversely, e∈K∩(E∪F∪G) implies that ∥P{e}(γ0(s))∥>0 for s∈(0,δ(0)) and we must have δ0=δ(0).
Thus, for any axis e in K∩(E∪F∪G), the projections P{e}(γλ(s)) and P{e}(γ0(s)) of the initial segments of these geodesics all lie in the closure of the stratum O(E∪F∪G). The uniform continuity of these geodesics, and so of their projections, with respect to their endpoints, together with their linearity within that closed stratum, implies that the components P{e}(γ˙λ(0)) converge to P{e}(γ˙0(0)) as λ→0. In particular, since E⊆K, this is valid for any axis e in E, which establishes (24).
∎
The comments made prior to Proposition 5 regarding the form of the expression for Φσ(x;x∗) can be generalised to apply to Ψτ(x,wτ;x∗): using the notation in Theorem 2(ii) for Ψ(x,wτ;x∗) we have that
[TABLE]
Recall that Sτ∖σl−l′ denotes the set of unit vectors in {0}×O(F)⊂R(E)×O(F) that comprises all unit vectors that are tangent to τ and orthogonal to σ. If l−l′=1, Sτ∖σl−l′ comprises a single point. When l−l′>1, for any fixed x∈Xm, the pseudo-partition of Xm determined by Dx induces a polyhedral subdivision of Sτ∖σl−l′ where, in each cell of the induced polyhedral subdivision, the form of the expression (19) for Ψ(x,⋅;x∗), and so the form of the expression for Ψτ(x,⋅;x∗), remains the same. In particular, this implies that, for fixed x, Ψτ(x,wτ;x∗) is a continuous function of wτ∈Sτ∖σl−l′. In fact, the directional derivatives of Ψτ(x,wτ;x∗) with respect to wτ also exist in directions v in the tangent space to Sτ∖σl−l′ at wτ that we denote by Twτ(Sτ∖σl−l′). These derivatives have the property given in the following proposition, where we note that R(E)×O(F)⊂R(E)×R(F) so that, for fixed x and x∗, wτ and Ψτ(x,wτ;x∗) lie in the same Euclidean space.
Proposition 6**.**
Let the stratum σ=O(E) of co-dimension l(⩾2) bound, in Xm, the stratum τ=O(E∪F) of co-dimension l′(<l−1). Fix x,x∗∈Xm with x∗∈σ. Then, as a function of wτ∈Sτ∖σl−l′, the directional derivative D of Ψτ(x,wτ;x∗) at wτ in the direction v∈Twτ(Sτ∖σl−l′) exists and satisfies
[TABLE]
Proof.
Without loss of generality, we may assume that ∥v∥=1. Consider the geodesic on Sτ∖σl−l′ given by α(s)=wτcoss+vsins. Write w1 for a vector whose coordinates comprise a subset of those of wτ, and v1, α1 for the corresponding components of v and α respectively. Then, the initial tangent vector of the function f(s)=∥α1(s)∥α1(s) is f˙(0)=v1Mw1†, where My† is given by (23). Clearly, ⟨w1,f˙(0)⟩=0, since the image of Mw1† is orthogonal to w1.
On the other hand, it follows from the argument in the proof of Theorem 2 that, for all sufficiently small s, the expression for Ψτ(x,α(s);x∗) all have the same form provided that, when wτ lies on the boundary of a cell of the induced polyhedral subdivision on Sτ∖σl−l′, we use for wτ the expression valid for s>0. Thus, we may use the expression for Ψτ(x,wτ;x∗) given by (25) to express DΨτ(x,wτ;x∗)(v) in the form vMx∗,x(wτ), where
[TABLE]
and where, using the notation of Theorem 2, Wli=PAli∩F(wτ) are just those components in the expression for Ψτ(x,wτ;x∗) for which PAli∩E(x∗)=0 and ∣Ali∩F∣>1. Since ∥y∥My† is the projection onto the hyperplane orthogonal to y in the Euclidean space where y lies as noted after the statement of Proposition 5, the result follows.
∎
The proof of Proposition 6 also shows that, if wτ lies in the interior of a single cell of the induced polyhedral subdivision of Sτ∖σl−l′, then Ψτ(x,wτ′;x∗) is differentiable with respect to wτ′ at wτ. However, if wτ lies in the boundary of a cell of the induced polyhedral subdivision, this no longer holds, although directional derivatives still exist.
The directional derivative of ⟨wτ,Ψτ(x,wτ;x∗)⟩, as a function of wτ, now follows from Proposition 6.
Corollary 5**.**
Assume that all assumptions in Proposition 6 hold. Then, for any v∈Twτ(Sτ∖σl−l′), the derivative D in the direction v of ⟨wτ,Ψτ(x,wτ;x∗)⟩ at wτ is given by
[TABLE]
Proof.
The second term in the expansion
[TABLE]
vanishes by Proposition 6. The result then follows since the directional derivative Dwτ(v) is given by the derivative at s=0 of the geodesic α(s)=wτcoss+vsins.
∎
6 Characterisation of Fréchet means
In the remainder of this paper, we use the knowledge obtained so far on the translated logarithm map to investigate Fréchet means of probability measures on Xm. So, from now on we assume that μ is a probability measure on Xm and that its Fréchet function defined by (1), where M=Xm, is finite at one point. The latter ensures that the Fréchet function of μ is finite everywhere.
Since the squared distance on a CAT(0)-space is a convex function with respect to each of its variables, it follows that the Fréchet mean of μ is unique and that the condition for x∗ to be the Fréchet mean of μ, that is, the condition for x∗ to satisfy
[TABLE]
is equivalent to this inequality holding in any neighbourhood of x∗. Then, since the Fréchet function of μ is differentiable at x∗ if x∗ lies in a top-dimensional, or locally top-dimensional, stratum, the above condition for such x∗ to be the Fréchet mean of μ is equivalent to the condition that
[TABLE]
similar to the condition for Fréchet means in Riemannian manifolds of non-positive curvature.
When x∗ lies in a stratum σ of locally positive co-dimension, the squared distance d(x∗,x)2 is no longer differentiable at x∗ for any fixed x. Nevertheless, it has directional derivatives along all possible directions and then the above condition becomes that, at x∗∈σ, the Fréchet function of μ has non-negative directional derivatives along all possible directions. The fact that Xm is a CAT(0)-space also implies that the derivative at x∗ in the direction w of the distance function dx=d(⋅,x) can be expressed as
[TABLE]
where ≪,≫ is defined by (2) (cf. [14], (2.5), p417). Thus, the criterion for a point x∗ lying in a stratum σ of locally positive co-dimension to be the Fréchet mean of μ is equivalent to the condition that
[TABLE]
for all tangent vectors w at x∗.
For any vector w at x∗ which is tangent to σ, the fact that −w is also tangent to σ at x∗ implies that the inequality (28) must be an equality for all such w. From this it follows that
[TABLE]
analogous to the condition (27). On the other hand, for any given stratum τ co-bounding σ and any vector w at x∗ tangent to τ, it is possible to link the derivative, at x∗, of the Fréchet function in the direction w with Ψτ(⋅,w;x∗), the projection of the directional limit of Φ(⋅;x∗). To show this, we need the following limiting property of the directional derivatives on general CAT(0)-spaces.
Lemma 1**.**
Let X be a CAT(0)-space, and let x0 and x be two distinct fixed points in X. For some ϵ>0, assume that γ:[0,ϵ)→X is a geodesic with γ(0)=x and γ˙(0)=vx. Then, if {xi:i⩾1} is a sequence of points along γ convergent to x, the derivative D at x in the direction vx of the distance function dx0=d(x0,⋅) has the property that
[TABLE]
where vxi denotes the tangent vector at xi of the geodesic γ.
Proof.
For x,y,z∈X, denote by ∠x(y,z) the Alexandrov angle at x between the geodesics from x to y and z respectively. Since Dd_{x_{0}}(v_{x})=-\ll v_{x},\log_{x}(x_{0})\gg/d_{x_{0}}(x)=-{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\|v_{x}\|}\cos\angle_{x}(x^{\prime},x_{0}) where x′ is a point on the geodesic γ, it is sufficient to show that, for a fixed point x′ chosen on γ, ∠x(x′,x0)=i→∞lim∠xi(x′,x0).
For this, we write γa,b for the (unique) geodesic segment joining a and b, for any two distinct points a and b in X. Then, given sequences of points ai→a, bi→b and ci→c in X, it follows from the Cartan-Hadamand theorem that the geodesic segments γai,bi and γai,ci converge uniformly, as maps, to γa,b and γa,c respectively. From this it follows that ∠a(b,c)⩾limsupi→∞∠ai(bi,ci) (cf. [7], Theorem 4.3.11, p.119). Applying this to the sequence of geodesic triangles Δ(x′xix0), we obtain
[TABLE]
On the other hand, using (4.3) p.124 of [7], we have
[TABLE]
where, as in Section 2, ∠ denotes the corresponding comparison angle in R2. Then, since ∠xi(x,x0)⩾∠xi(x,x0), the above implies that
[TABLE]
However, since X has non-positive curvature, if xi lies between x and x′ on the geodesic segment γx,x′, then ∠xi(x′,x0)+∠xi(x0,x)⩾π (cf. [7], p117, line 5). Hence,
Recalling that Φ(x;x∗)=logx∗(x)+x∗, the criteria (28) and (29) for a point x∗ to be the Fréchet mean of μ may now be recast, the former in terms of the standard Euclidean inner product and Ψτ(x,w;x∗), the projection of the directional limit of Φ(x;x∗), when x∗ lies in a stratum σ of positive co-dimension and w is tangent to a co-bounding stratum τ.
Theorem 3**.**
Let σ be a stratum in Xm of co-dimension l(⩾0). The necessary and sufficient conditions for a given point x∗∈σ to be the Fréchet mean of μ are
(i)
for any stratum τ in Xm of co-dimension l′, 0⩽l′<l, co-bounding σ and any wτ∈Sτ∖σl−l′,
[TABLE]
where Sτ∖σl−l′ is given by Definition 4;**
2. (ii)
for all l⩾0,
[TABLE]
Note that case (i) may only occur if l>0, but need not occur then. Note also that, if Xm is a tree space, the special case l=0 of this result is the same as that of Lemma 3 of [3]; and the special case l=1, so that l′=0, is equivalent to that given by Lemma 5 of [3]: on the one hand, Sτ∖σl−l′ contains a single unit vector and, on the other hand, as we noted earlier, Ψτ(⋅,wτ;x∗)=Ψ(⋅,wτ;x∗) is identical with the composition of the ‘folding map’ with Φ(⋅;x∗) in [3].
Proof.
Noting that (ii) is precisely (29), it is sufficient to show that (i) is equivalent to (28) for any tangent vector w that is not tangent to σ=O(E). For this, we fix any stratum τ=O(E∪F), of co-dimension l′, co-bounding σ=O(E) and take w=wτ∈R(E)×O(F). Then, it follows from Lemma 1 that (28) is equivalent to
[TABLE]
where \mathop{\boldsymbol{x}}\nolimits^{*}(\lambda,\mathop{\boldsymbol{w}}\nolimits_{\tau}){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}=\mathop{\boldsymbol{x}}\nolimits^{*}+\lambda\mathop{\boldsymbol{w}}\nolimits_{\tau}}, as defined prior to Theorem 2. Since wτ is tangent to τ at x∗(λ,wτ) for sufficiently small λ>0 and, for any given x, logx∗(λ,wτ)(x) is tangent either to τ or to one of the strata that co-bound τ, we have
[TABLE]
However,
[TABLE]
Hence, by Corollary 4(i) and then Corollary 3(ii), (33) is equivalent to
[TABLE]
where wτ⊥=Pτ∖σ(wτ). Decomposing wτ as wτ=wσ+wτ⊥, where wσ=Pσ(wτ), leads to
[TABLE]
where the second equality follows from Corollary 4(ii). The required result now follows by noting (ii), noting that ⟨wτ,x∗⟩=⟨wσ,x∗⟩=0 and noting that, by applying the projection Pτ to the result of Corollary 3(i), Ψτ(x,wτ⊥;x∗)=Ψτ(x,wτ⊥/∥wτ⊥∥;x∗).
∎
From now on, we assume that ξ is a random variable defined on a probability space (Ω,F,P) with values in Xm and that μ is the distribution (measure) of ξ, i.e. μ(B)=P(ξ−1(B)) for any Borel set B in Xm. When the stratum containing the Fréchet mean x∗ of the probability measure μ on Xm is of locally positive co-dimension, (31) being an equality has a significant influence on the nature of the distributions of the Euclidean random variables Ψτ(ξ,wτ;x∗), which will be seen in Propositions 7, 8 and 9. We shall also see, in Proposition 10, its link with the long term behaviour of sample Fréchet means.
Definition 12**.**
For the stratum σ of co-dimension l(⩾1), in which the Fréchet mean x∗ of μ lies, and the stratum τ, of co-dimension l′, co-bounding σ, the subset Θτ,σ(x∗;μ) of Sτ∖σl−l′ is defined as
The convexity of the directional derivative D(dx2)(w) in w (cf. [14], pp416-417) ensures that Θτ,σ(x∗;μ) is a convex subset of Sτ∖σl−l′ and that
[TABLE]
is a convex subset of τ⊃σ⋃Sτ∖σl−l′⊆Cσ. If l−l′=1, Sτ∖σl−l′ consists of a single unit vector so that Θτ,σ(x∗;μ) is either Sτ∖σl−l′ itself or an empty set. In general, if the closure of Θτ,σ(x∗;μ) is contained in Sτ∖σl−l′, the fact that ⟨wτ,Ψτ(x,wτ;x∗)⟩ is continuous in wτ∈Sτ∖σl−l′ implies that Θτ,σ(x∗;μ) itself must be closed.
The following result gives a relationship between the Fréchet mean x∗ of μ and the Euclidean mean of Ψτ(ξ,wτ;x∗). Here, and henceforth, by interior we intend the relative interior, that is, interior with respect to the subspace topology.
Proposition 7**.**
Let the stratum σ of co-dimension l(⩾2) bound, in Xm, the stratum τ of co-dimension l′(<l−1). Assume that the Fréchet mean x∗ of μ lies in σ and that int(Θτ,σ(x∗;μ))=∅. Then, for any wτ∈Θτ,σ(x∗;μ),
[TABLE]
Note that, if l′=l−1, equality (36) holds automatically since its left hand side is a 1-dimensional vector so that the equality follows from the assumption that wτ∈Θτ,σ(x∗;μ).
Proof.
By the continuity of Ψτ(x,wτ;x∗) in wτ, we may assume that wτ∈int(Θτ,σ(x∗;μ)). Then equality holds in (31) in a neighbourhood of wτ, so that
On the other hand, it follows from ∫XmΦσ(x;x∗)dμ(x)=x∗ and ⟨wτ,x∗⟩=0 that
[TABLE]
Hence, taking the directional derivative of the left hand side as a function of wτ∈Sτ∖σl−l′, we have
[TABLE]
for all v∈Twτ(Sτ∖σl−l′). Noting that the left hand side of (36) is a vector lying in the (l−l′)-dimensional Euclidean space containing Sτ∖σl−l′, the fact that wτ∈Θτ,σ(x∗;μ), together with the above, implies that the required result holds for any wτ∈int(Θτ,σ(x∗;μ)).
∎
One immediate consequence of Proposition 7 is the following.
Corollary 6**.**
Assume that the conditions given in Proposition 7 are satisfied. If σ=O(E) and τ=O(E∪F) then, for all wτ∈Θτ,σ(x∗;μ),
[TABLE]
That is, the point x∗∈σ, as a point in R(E∪F), is the Euclidean mean of each of the Euclidean random variables Ψτ(ξ,wτ;x∗) for such wτ.
If a stratum σ=O(E) of co-dimension l(⩾1) bounds, in Xm, the stratum τ of co-dimension l′(<l) and if x∗∈σ, then it follows from the proof of Proposition 6 that the maps Ψτ(⋅,wτ1;x∗) and Ψτ(⋅,wτ2;x∗) from Xm to R(E∪F) are generally not identical for any given distinct wτi∈Sτ∖σl−l′, i=1,2. With the insight obtained from that proof, to characterise the places where they differ we introduce the subset Στ,σ(x∗;wτ) of Xm as follows. It will be clear later, in the proof of Proposition 9, that the set of x∈Xm where Ψ(x,wτ1;x∗)=Ψ(x,wτ2;x∗) is contained in the set Στ,σ(x∗;wτ1)∪Στ,σ(x∗;wτ2)∪Dx∗. Thus, in particular, for ξ lying outside of the latter set, the Euclidean random variables Ψ(ξ,wτ1;x∗) and Ψ(ξ,wτ2;x∗) are identical. This fact will be used in the derivation of the limiting distribution of sample Fréchet means in the next section.
Definition 13**.**
Let the stratum σ=O(E) of co-dimension l(⩾1) bound, in Xm, the stratum τ=O(E∪F) of co-dimension l′(<l). For x∗∈σ and wτ∈Sτ∖σl−l′, a point x∈Xm is called singular with respect to (x∗,wτ), if at least one Ai with Ai∩E=∅ has ∣Ai∩F∣>1, where i⩾1 and the sequences A=(A0,A1,⋯,Ak) and B=(B0,B1,⋯,Bk) form the support of the geodesics from x∗+λwτ to x for all sufficiently small λ>0. The set Στ,σ(x∗;wτ) consists of all points x that are singular with respect to (x∗,wτ).
For example, in the orthant space X2 of Example 3, using the notation there, Στ,{o}(o;wτ) is the closure of the light grey region in Figure 2. It follows from comparison of the corresponding expressions (19) and (25) that the singularity of x with respect to (x∗,wτ) has the same effect on Ψτ(x,wτ;x∗) as it does on Ψ(x,wτ;x∗). In particular, in terms of the matrix Mx∗,x(w) given by (26), we can express Στ,σ(x∗;wτ) defined above as
[TABLE]
Note that Στ,σ(x∗;wτ)=∅ if l−l′=1, since then Sτ∖σl−l′ contains a single unit vector wτ which leads to the impossibility that ∣Ai∩F∣>1. Generally, if l−l′>1, which implies that l⩾2, Στ,σ(x∗;wτ) could be relatively substantial. Nevertheless, we have the following result on the measure of Στ,σ(x∗;wτ).
Proposition 8**.**
Let the stratum σ of co-dimension l(⩾2) bound, in Xm, the stratum τ of co-dimension l′(<l−1). Assume that the Fréchet mean x∗ of μ lies in σ and that wτ∈int(Θτ,σ(x∗;μ)), where Θτ,σ(x∗;μ) is defined by (34). Then, μ(Στ,σ(x∗;wτ))=0.
Proof.
Let α(s) be a unit speed geodesic in Sτ∖σl−l′, write v(s)=α˙(s) and define h(s)=⟨v(s),∫XmΨτ(x,α(s);x∗)dμ(x)⟩. Since Sτ∖σl−l′ is an open subset of a Euclidean sphere, we have \dot{\mathop{\boldsymbol{v}}}\nolimits(s)=-\alpha(s), α¨(s)=−α(s) and so, by Proposition 6 and its proof,
[TABLE]
where Mx∗,x(w) is given by (26). The expression for Mx∗,x(w) implies that, for w∈Sτ∖σl−l′ and any fixed x∈Στ,σ(x∗;w), ⟨v,vMx∗,x(w)⟩ can be written in the form
[TABLE]
for some 1⩽j⩽k, where Wli and PBli(x) are those required for the expression (26) for Mx∗,x(w) in the proof of Proposition 6. This implies that h˙(0) must be non-positive. Moreover, for any open or closed subset E⊆Στ,σ(x∗;α(0)) such that Ψτ(x,α(0);x∗) has the same expression for all x∈E, there is a vector v(0)∈Tα(0)(Sτ∖σl−l′) such that ⟨v(0),v(0)Mx∗,x(α(0))⟩<0 for all x∈E. Then, if μ(E)=0, the corresponding h satisfies
[TABLE]
Clearly, Στ,σ(x∗;α(0)) can be decomposed as a finite disjoint union of such sets E.
If wτ=α(0)∈int(Θτ,σ(x∗;μ)) then, for any v(0)∈Twτ(Sτ∖σl−l′), the corresponding geodesic α(s) lies in Θτ,σ(x∗;μ) for all sufficiently small s⩾0. Using a similar argument to that for the proof of Proposition 7, the corresponding h(s) must be identically zero for all sufficiently small s⩾0, which implies that h˙(0)=0. Hence, we must have μ(Στ,σ(x∗;wτ))=0.
∎
If a stratum σ bounds τ in Xm, x∗∈σ and wτ1, wτ2 are two different vectors at x∗ tangent to τ, then it follows from the map Ψτ(⋅,wτ1;x∗) generally differing from Ψτ(⋅,wτ2;x∗) that the distribution of the Euclidean random variable Ψτ(ξ,wτ1;x∗) generally differs from that of Ψτ(ξ,wτ2;x∗). Nevertheless, under the conditions in Proposition 8, the Ψτ(ξ,wτ;x∗) are in fact a.s. identical for wτ∈int(Θτ,σ(x∗;μ)).
Proposition 9**.**
Assume that ξ is a random variable on Xm with distribution measure μ having Fréchet mean x∗. Assume further that μ(Dx∗)=0 and that x∗ lies in the stratum σ=O(E) of co-dimension l(⩾2). Let the stratum τ of co-dimension l′(<l−1) co-bound σ, in Xm. If int(Θτ,σ(x∗;μ))=∅, then the distributions of the Euclidean random variables Ψτ(ξ,wτ;x∗) are independent of wτ∈int(Θτ,σ(x∗;μ)), where the set Θτ,σ(x∗;μ) is defined by (34).
Note that the example in the next section makes it clear that the condition wτ∈int(Θτ,σ(x∗;μ)) in the statement of Proposition 9 cannot be relaxed to wτ∈Θτ,σ(x∗;μ).
Proof.
First, we show that, for any given distinct wτj∈Sτ∖σl−l′, j=1,2, and for x∈Στ,σ(x∗;wτ1)⋃Στ,σ(x∗;wτ2)⋃Dx∗, Ψτ(x,wτ1;x∗)=Ψτ(x,wτ2;x∗). Then, it follows from the assumption and Proposition 8 that Ψτ(ξ,wτ1;x∗)=Ψτ(ξ,wτ2;x∗) a.s. Recall from the proof of Theorem 2(ii) that, for fixed x∈Xm, x∗∈σ and wτ∈Sτ∖σl−l′, the supports of the geodesics from x∗(λ,wτ)=x∗+λwτ to x are the same, for all sufficiently small λ>0, and that the expression for Ψτ(x,wτ;x∗) is determined by this common support. Thus, Ψτ(x,wτj;x∗) is identical if the geodesics from x∗(λ,wτj) to x have the same support when λ>0 is sufficiently small.
Suppose now that the supports (Aj,Bj), j=1,2, of the geodesics from x∗(λ,wτ1) and x∗(λ,wτ2) respectively to x are different, for all sufficiently small λ>0. Then, the geodesic γλ between x∗(λ,wτ1) and x∗(λ,wτ2) must meet at least one hyper-surface in Dx. If there are more than one, but necessarily finitely many, such hyper-surfaces, by introducing a point on γλ between each pair of consecutive such hyper-surfaces, the change of the supports of the geodesics from points of γλ to x can be considered inductively to reduce the case to where γλ meets only one such hyper-surface.
Hence, without loss of generality, we assume that γλ only meets Dx at a point xλ on one of the hyper-surfaces in Dx. That is, xλ satisfies (17) for a particular i0 with x∗ being replaced by xλ and all the other relevant inequalities in Proposition 2, with x1 and x2 replaced by xλ and x, are strict. If x∈Dx∗ so that x∗∈Dx, we may assume that the points x∗(λ,wτj) lie on the opposite sides of H for all sufficiently small λ>0. Then, by Proposition 4, as γλ moves through xλ, the supports of the geodesics from γλ to x change, with the relevant subset Ai01=Ci01∪Ci02 of the sequence A1=(A01,⋯,Ak1) in the support (A1,B1) on the one side splitting, say, into two subsets Ci01,Ci02 on the other, and similarly for Bi01 in B1. That is, the support (A2,B2) of the geodesics from x∗(λ,wτ2) to x is related to (A1,B1) by A2=(A01,⋯,Ai0−11,Ci01,Ci02,Ai0+11,⋯,Ak1), and similarly B2 to B1. We show now that neither of these subsets Ci01 and Ci02 can meet E. If only one of these two sets meets E, say Ci01, then since PCi02(x∗(λ,wτ1))→0 as λ→0, it is impossible that there are xλ such that the corresponding equality (17) holds for all sufficiently small λ>0. Similarly, if both of these sets meet E, then the proof of Corollary 4(ii) shows that PCi0s(x∗(λ,wτj))→PCi0s(x∗), as λ→0, for j=1,2. This implies that, for j=1, the corresponding strict inequality (10) holds for x∗ while, for j=2, it is reversed. Hence, that is also impossible.
Thus, in the case when the supports (Ai,Bi) are different, we still have Aj1=Aj2 for all j>0 such that Aj1∩E=∅.
If further x∈Στ,σ(x∗;wτ1)⋃Στ,σ(x∗;wτ2), then the change of the support described above cannot happen when both Ci01∩E and Ci02∩E are empty, as then ∣Ai01∩F∣>1, and so we would have x∈Στ,σ(x∗;wτ1). Since A01=A02, the above implies that we must have (Ai,Bi) identical for i=1,2 and so, for such x, Ψτ(x,wτ2;x∗)=Ψτ(x,wτ1;x∗).
Next, assume that the two wτj are chosen to be sufficiently close that, for any given x and all sufficiently small λ>0, the geodesics from x∗(λ,wτj) to x have the same support. Then, if wτ(α), α∈[0,1], is the geodesic between wτ1 and wτ2, an elementary argument on the relevant parameters in the inequalities (8) and (10) that determine the carrier will show that these parameters are monotonic in α along the geodesic. So, the geodesic from x∗(λ,wτ(α)) to x will have the same support as that for the geodesics from x∗(λ,wτj) to x. This implies that Στ,σ(x∗;wτ(α))⊆Στ,σ(x∗;wτ1)⋃Στ,σ(x∗;wτ2), so that Ψτ(ξ,wτ(α);x∗) are a.s. independent of α∈[0,1].
Finally, since Θτ,σ(x∗;μ) is convex, there is a sequence {wτn∣n⩾1}⊂int(Θτ,σ(x∗;μ)) such that
[TABLE]
where Cn is the convex hull in Sτ∖σl−l′ of {wτ1,⋯,wτn}. The above argument implies that, without loss of generality, we may also assume that {wτn∣n⩾1} have the property that, for any wτ∈Cn,
[TABLE]
This shows that
[TABLE]
so that Ψτ(ξ,wτ;x∗) are a.s. independent of wτ∈Cn. Hence, it follows from (38) that
[TABLE]
which gives the required result.
∎
7 The limiting distribution of sample Fréchet means
In this section, we assume that {ξi:i⩾1} is a sequence of i.i.d. random variables defined on a common probability space (Ω,F,P) with values in Xm; that μ is the distribution measure of ξ1; and that \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} is the sample Fréchet mean of ξ1,⋯,ξn. Then, \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} converges to the Fréchet mean x∗ of μ almost surely as n tends to infinity (cf. [23]).
7.1 On the support of the limiting distribution
If x∗ lies in a top-dimensional stratum, Xm is locally an m-dimensional manifold. One would expect that the limiting behaviour of sample Fréchet means \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} is similar, to some extent, to that of sample Fréchet means in a Riemannian manifold as obtained in [4] and [13]. In particular, the support of the limiting distribution of \sqrt{n}\log_{x^{*}}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})=\sqrt{n}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}) is the entire tangent space to Xm at x∗, as long as cov(Φ(ξ1;x∗)) has rank m. This fact was proved for the case of open books in [10] and for the case of tree spaces in [2] and [3]. We shall see in the following that the argument used in [3] can be generalised to Xm, so that the corresponding conclusion is also valid for orthant spaces.
However, when x∗ lies in a stratum of locally positive co-dimension, the limiting behaviour of sample Fréchet means is generally very different. In the case that Xm is an open book or a tree space and that the stratum containing x∗ is of the co-dimension one, this phenomenon was observed and studied in [10], [2] and [3]. Similarly, for general orthant spaces, the strictness or otherwise of the inequality (31) affects the limiting behaviour of \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}. In particular, when (31) is strict, there is a constraint on the support of the limiting distribution. To describe this, we recall that, for σ=O(E) of co-dimension l and τ=O(E∪F) of co-dimension l′<l co-bounding σ, we are denoting the set of unit vectors in R(E)×O(F) by Sτ,σm−l′ and those in {0}×O(F) by Sτ∖σl−l′. Then, for wτ in the latter, denote by Hwτ the intersection of the half hyper-plane R(E)×{cwτ∣c>0} with Sτ,σm−l′, namely
[TABLE]
and let
[TABLE]
Proposition 10**.**
Let the stratum σ=O(E) of co-dimension l(⩾1) bound, in Xm, the stratum τ=O(E∪F) of co-dimension l′(<l). Assume that the Fréchet mean x∗ of μ lies in σ and that \mathop{\boldsymbol{w}}\nolimits_{\tau}\in\mathop{\mathcal{S}}\nolimits^{l-l^{\prime}}_{\tau\setminus\sigma}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\setminus\Theta_{\tau,\sigma}(\mathop{\boldsymbol{x}}\nolimits^{*};\mu)}, where Sτ∖σl−l′ and Θτ,σ(x∗;μ) are given by Definitions 4 and 12 respectively. Then,
[TABLE]
Proof.
For wτ as given in the proposition, let
[TABLE]
Then, the set Ωwτ consists of points with the property that, for arbitrary ϵ>0, there exist arbitrarily large n such that \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} lies in τ and (\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*})/\|\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}\| is within a distance ϵ of Hwτ. Since Ωnk(wτ)⊇Ωnk+1(wτ), the required result is equivalent to showing that {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\bf P}}(\Omega_{\mathop{\boldsymbol{w}}\nolimits_{\tau}})=0.
Without loss of generality, we may assume that, restricted to Ωwτ, \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} lie in τ for all n and \mathop{\boldsymbol{w}}\nolimits_{n}=(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*})/\|\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}\|\rightarrow\mathop{\boldsymbol{w}}\nolimits as n→∞ for some (random) unit vector w∈Hwτ.
Recall that, for given \mathop{\boldsymbol{w}}\nolimits_{\tau}\in\mathop{\mathcal{S}}\nolimits^{l-l^{\prime}}_{\tau\setminus\sigma}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\setminus\Theta_{\tau,\sigma}(\mathop{\boldsymbol{x}}\nolimits^{*};\mu)}, each Ψτ(ξi,wτ;x∗) is a Euclidean random variable on R(E∪F). Then, let
[TABLE]
and write Ω0 for the subset of Ω consisting of points such that \hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\mathop{\boldsymbol{w}}\nolimits_{\tau}}_{n} converges to
[TABLE]
It follows from the classical Law of Large Numbers that {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\bf P}}(\Omega_{0})=1. Hence, restricted to Ωwτ∩Ω0, the assumption on wτ implies that, for some constant c<0, there is an n0 such that, for n>n0, \langle\mathop{\boldsymbol{w}}\nolimits_{\tau},\,\hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\mathop{\boldsymbol{w}}\nolimits_{\tau}}_{n}\rangle<c. However, the assumption that wn→w∈Hwτ implies that \langle\mathop{\boldsymbol{w}}\nolimits_{\tau},\,\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}\rangle>0 for all sufficiently large n. Putting these two conclusions together, we have that, restricted to Ωwτ∩Ω0,
[TABLE]
as ⟨wτ,x∗⟩=0.
On the other hand, restricted to Ωwτ∩Ω0, \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} is in τ by the assumption made earlier. Then, it follows from (32) that
[TABLE]
Thus, we can express the difference \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\mathop{\boldsymbol{w}}\nolimits_{\tau}}_{n} as
[TABLE]
Decompose wn=(wn)σ+(wn)⊥, where (wn)σ=Pσ(wn) and (wn)⊥=Pτ∖σ(wn). Then, by Corollary 3(ii), for each 1⩽i⩽n,
[TABLE]
where (wn)τ=(wn)⊥/∥(wn)⊥∥∈Sτ∖σl−l′. Without loss of generality, we assume that the carriers of the geodesics from \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} to ξi remain constant. The general case follows from a similar inductive argument to that outlined in the beginning of the proof of Proposition 9 and from the fact that \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} converges to x∗ a.s. Then, if (A,B) is the common support of the geodesics, where A=(A0,⋯,Ak) and B=(B0,⋯,Bk), and, for 0<j⩽k, writing Wj for PAj∩E(x∗) if Aj∩E=∅ and otherwise PAj∩F(wn), Theorem 1 tells us that the jth set of components of (\jmath^{-1})(\Phi(\mathop{\boldsymbol{\xi}}\nolimits_{i},\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})) is the vector -\frac{\|P_{B_{j}}(\mathop{\boldsymbol{\xi}}\nolimits_{i})\|}{\|P_{A_{j}}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})\|}P_{A_{j}}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}) and, for (−1)(Ψ(ξi,wn;x∗)), Theorem 2(ii) tells us that the corresponding vector is -\frac{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\|P_{B_{j}}(\mathop{\boldsymbol{\xi}}\nolimits_{i})\|}}{\|W_{j}\|}W_{j}. Hence, the proof of Theorem 2(ii) shows that, when Aj∩E=∅, these two vectors are identical, since P_{A_{j}}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})=P_{A_{j}\cap F}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})=P_{A_{j}\cap F}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}). While, if Aj∩E=∅, the difference between these two vectors is of the same order as \frac{P_{A_{j}\cap F}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})}{\|P_{A_{j}}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})\|} whose limit, as n→∞, is zero since \|P_{A_{j}}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})\|\geqslant\|P_{A_{j}\cap E}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})\|\rightarrow\|P_{A_{j}\cap E}(\mathop{\boldsymbol{x}}\nolimits^{*})\|>0 but P_{A_{j}\cap F}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})\rightarrow 0 a.s. It follows that, as n→∞,
[TABLE]
Moreover, since w⊥=Pτ∖σ(w)=0, wn→w implies that (wn)τ→∥w⊥∥w⊥=wτ. Then, it follows from a similar argument to that of the proof of Proposition 6 that, for sufficiently large n,
[TABLE]
where vn is the component of (wn)τ−wτ orthogonal to wτ, so that as n→∞,
[TABLE]
by Proposition 6. Then, (40), (43), (44) and (45) together imply that, when it is restricted to Ωwτ∩Ω0, \langle\mathop{\boldsymbol{w}}\nolimits_{\tau},\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\mathop{\boldsymbol{w}}\nolimits_{\tau}}_{n}\rangle\rightarrow 0 a.s., as n→∞, contradicting (39). Hence, P(Ωwτ∩Ω0)=0, so that P(Ωwτ)=0 as required.
∎
When l−l′=1, Sτ∖σl−l′ contains a single unit vector, so that we have the following special case. In particular, taking l=1 and so l′=0 recovers the result of Lemma 6 in [3] for the case of co-dimension one when Xm is a tree space.
Corollary 7**.**
Let the stratum σ of co-dimension l(⩾1) bound, in Xm, the stratum τ of co-dimension l′=l−1. Assume that the Fréchet mean x∗ of μ lies in σ. If the inequality (31) corresponding to the unique wτ∈Sτ∖σl−l′ is strict then, for all sufficiently large n, \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} cannot lie in τ.
Thus, when l−l′=1, the support of the limiting distribution of any appropriately scaled difference \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*} intersects the stratum R(E)×O(F) in the tangent cone to Xm at x∗ only if the inequality (31) corresponding to the unique wτ∈Sτ∖σl−l′ is an equality.
Similar to the case where l−l′=1, Proposition 10 has the following consequence on the support of the limiting distribution when l−l′>1, where C(Θ) denotes the Euclidean cone on Θ.
Corollary 8**.**
Let the stratum σ=O(E) of co-dimension l(⩾2) bound, in Xm, the stratum τ=O(E∪F) of co-dimension l′⩽l−2. Assume that x∗∈σ is the Fréchet mean of μ. Then the support of the limiting distribution of an appropriately scaled difference \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}, if it meets the stratum R(E)×O(F) in the tangent cone to Xm at x∗, must be contained in R(E)×C(Θτ,σ(x∗;μ)), where Θτ,σ(x∗;μ) is defined by (34).
Hence, the support of the limiting distribution of an appropriately scaled difference \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*} is contained in Kμ where, for the closed sets
[TABLE]
in the tangent cone to Xm at x∗,
[TABLE]
and where we regard σ as co-bounding itself. Nevertheless, the following example shows that
(i)
if it is non-empty, R(E)×C(Θτ,σ(x∗;μ)) is not necessarily an entire stratum R(E)×O(F);
2. (ii)
even if it is the entire stratum, the support of the limiting distribution of \sqrt{n}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}) does not necessarily intersect that stratum; and
3. (iii)
it is possible that the support of the limiting distribution, when restricted to the stratum, is only a subset of R(E)×C(Θτ,σ(x∗;μ)).
Example 4**.**
Consider the orthant space X2 of Example 2. Let μ have mass 1/2 at the two points p1 and p2 equidistant from the cone point o along a geodesic through that point as illustrated in Figure 4.
Then its Fréchet mean is at the cone point and the sample Fréchet means always lie on this geodesic segment. This, in particular, implies that the support of the limiting distribution of \sqrt{n}\{\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-o\} is the union of the cone point with two half lines, one in {o}×τ1,5A and other in {o}×τ2,3A, each extending the relevant geodesic segment, where τi,jA=O(ui,uj).
(a)
For any direction wτ1,2A∈Sτ1,2A∖{o}2, Ψτ1,2A(p,wτ1,2A;o) lies in the plane spanned by the orthant τ1,2A, for any p, and, by identifying u3 and u5 with −u1 and −u2 respectively, Ψτ1,2A(pi,wτ1,2A;o)=pi for i=1,2. Thus,* ∫X2Ψτ1,2A(p,wτ1,2A;o)dμ(p)=0 and so Θτ1,2A,{o}(o;μ)=Sτ1,2A∖{o}2. Since, in this case, the support of the limiting distribution does not intersect {0}×τ1,2A, this illustrates (ii) above with σ={o} and τ=τ1,2A.*
2. (b)
For any direction wτ1,5A∈Sτ1,5A∖{o}2such that the angle between wτ1,5A and u1-axis is less than or equal α, a similar argument shows that
[TABLE]
Hence, such wτ1,5A are always contained in Θτ1,5A,{o}(o;μ), i.e.
[TABLE]
where θ∈Sτ1,5A2∖{o} is measured from the u1-axis.**
3. (c)
However,* for any direction wτ1,5A∈Sτ1,5A∖{o}2such that the angle between wτ1,5A and the u1-axis is greater than α, the vector Ψτ1,5A(p1,wτ1,5A;o)=p1, but the vector Ψτ1,5A(p2,wτ1,5A;o) lies on the line spanned by the unit vector wτ1,5A in (u1,u5)-plane. Hence, these two vectors do not lie on the same line in the (u1,u5)-plane through the origin. This gives*
[TABLE]
Hence, if the angle between wτ1,5A and the u1-axis is greater than α,* then wτ1,5A∈Θτ1,5A,{o}(o;μ). Combining this with the conclusion (b) shows that Θτ1,5A,{o}(o;μ)={θ∈Sτ1,5A∖{o}2∣θ⩽α}, illustrating (i) and (iii) above with σ={o} and τ=τ1,5A.*
7.2 The limiting distribution
To describe the limiting distribution of \sqrt{n}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}), where the Fréchet mean x∗ of μ lies in a stratum σ=O(E) of local co-dimension l⩾0, we continue to regard σ as co-bounding itself so that, in this case, the set F of additional axes in the ‘co-bounding’ stratum is empty. Moreover, we shall relate the form of the limiting distribution in the set (46) for each τ co-bounding σ to a limiting distribution of the Euclidean means of various Euclidean random variables depending on τ:
(i)
for τ=σ, corresponding to the set R(E)×{0} in (46), the relevant Euclidean random variable is Φσ(ξ1;x∗);
2. (ii)
for τ=σ the relevant Euclidean random variable is Ψτ(ξ1,wτ;x∗) where, if l−l′>1, wτ is any chosen vector in int(Θτ,σ(x∗;μ)) if this set is not empty and, if l−l′=1 with Θτ,σ(x∗;μ)=∅, wτ is its unique element;
3. (iii)
we take the zero random variable otherwise.
Note that, by Proposition 9, different choices of wτ in the case l−l′>1 of (ii) give random variables that are a.s. equal. Note also that, by Corollary 8, the random variables in the case (iii) play no role in the description of the limiting distribution so that they can be replaced by any other random variables. For simplicity, we denote the relevant random variable above in each case by Ψ~τ(ξ1;x∗). With this context and notation write, for each τ co-bounding σ,
[TABLE]
where Mx∗σ(x) is defined by (22), Uτ is the M×(m−l′) matrix whose entries are all zero except for those at (li,i) being one, and ul1A,⋯,ulm−l′A are the ordered axes that span (R(E∪F)). Note that, since Mx∗σ(x) is negative semi-definite, the above inverse is well defined when E[Mx∗σ(ξ1)] exists. Then, letting Zτ be a random variable in R(E∪F) with normal distribution N(0,Aσ,τ⊤VτAσ,τ), where Vτ=cov(Ψ~τ(ξ1;x∗)), we have the following result.
Theorem 4**.**
Let σ=O(E) be a stratum in Xm of co-dimension l(⩾0). Assume that
(i)
the Fréchet mean x∗ of μ lies in σ;
2. (ii)
μ(Dx∗)=0, where Dx∗ is given by Definition 11;
3. (iii)
E[Mx∗σ(ξ1)]* exists, where Mx∗σ(x) is given by (22);*
4. (iv)
for any stratum τ in Xm which co-bounds σ and has co-dimension l′⩽l−2, if Θτ,σ(x∗;μ)=∅ then int(Θτ,σ(x∗;μ))=∅.
Then, if there exists a random variable η on the tangent cone at x∗ such that
[TABLE]
then η has the following property: for any stratum τ=O(E∪F) of co-dimension l′(⩽l) co-bounding σ, if P(η∈R(E)×O(F))>0 then, for Zτ defined as above and Kμ by (47),
[TABLE]
for any Borel set B contained in
[TABLE]
Proof.
We assume that l′⩽l−2. The case for l′=l−1 can be similarly derived by noting Corollary 7, whereas for l′=l the result can be derived directly by simplifying the following arguments.
Write Ξτ=O(E)×C(int(Θτ,σ(x∗;μ))). By Corollary 7, given \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}\in\tau, \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}\in\Xi_{\tau} for sufficiently large n and, by Theorem 3, we also have
[TABLE]
For any x′∈τ and x∈Xm, denote the projection Pτ∖σ(Φτ(x;x′)) of Φτ(x;x′) by Φτ∖σ(x;x′). Define Ψ~τ∖σ(x;x∗) similarly. Then, Ψ~τ∖σ(x;x∗)=Ψ~τ(x;x∗)−Φσ(x;x∗) by Corollary 4(ii). Since \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} is in Ξτ and converges to x∗ a.s., the result of Proposition 9 and the argument for the proof of Theorem 2 together imply that, for any given x and all sufficiently large n, \Phi_{\tau\setminus\sigma}(\mathop{\boldsymbol{x}}\nolimits;\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})=\tilde{\Psi}_{\tau\setminus\sigma}(\mathop{\boldsymbol{x}}\nolimits;\mathop{\boldsymbol{x}}\nolimits^{*}) a.s.. Hence, in particular, for sufficiently large n,
[TABLE]
is a.s. the Euclidean mean of Ψ~τ∖σ(ξ1;x∗),⋯,Ψ~τ∖σ(ξn;x∗), so that
[TABLE]
Thus, the limiting distribution of \sqrt{n}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*})1_{\Xi_{\tau}}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}) is the same as that of
[TABLE]
Since Ψ~σ(ξi;x∗)=Φσ(ξi;x∗), Proposition 5 implies that the limiting distribution of \sqrt{n}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*})1_{\Xi_{\tau}}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}) is equal to that of
[TABLE]
Hence, by (37), the required result follows from a similar argument to that used in [2] and [3].
∎
As for Θσ(x∗;μ) defined by (35), the convexity in w of the directional derivative D(dx2)(w) implies that Kμ is a convex subset of the tangent cone to Xm at x∗. This, together with the structure of an orthant space, implies that the result of Theorem 4 refers to the behaviour of the limiting distribution only within the interior of Kμ. Its behaviour at the boundaries will depend on how these sets relate to each other and on the shape of the boundary ∂Kμ.
The assumption in Theorem 4 that μ(Dx∗)=0 ensures that we are able to employ the so-called delta method for the approximate probability distribution of a function of an asymptotically normal statistical estimator. In principle, it is possible to relax this assumption by using directional derivatives and combining that with the use of the law of the total probability. However, it is clear from the definition of Dx∗ that its structure, although conceptually straightforward, is generally more complex than will admit a simple algebraic representation, and the ensuing results will consequently depend heavily on the behaviour of μ on Dx∗.
To observe special cases of Theorem 4, let σ=O(E) be a stratum in Xm of co-dimension l(⩾0) in which the Fréchet mean x∗ of μ lies, assume that the conditions of Theorem 4 are satisfied and write
[TABLE]
where we assume that l(μ)=l if there is no τ with co-dimension l′<l which satisfies the above required condition. We assume further that, for τ=O(E∪F) of co-dimension l(μ) co-bounding σ and, if l(μ)<l, with Θτ,σ(x∗;μ)=∅, Vτ=cov(Ψ~τ(ξ1;x∗)) is of full rank m−l(μ). Then, it is clear from the proof of Theorem 4 that P(η∈R(E)×O(F))>0.
Case l(μ)=l: in this case, Kμ=R(E) and the support of the distribution of η is contained in the tangent space of σ. Then, Theorem 4 says that η is a normal random variable with mean zero and covariance matrix Aσ,σ⊤cov(Φσ(ξ1;x∗))Aσ,σ, where Aσ,τ is defined by (48). This generalises the limiting distribution of \sqrt{n}\{\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}\} when x∗ lies in a top-dimensional stratum of a tree space obtained in [3].
Case l(μ)=l−1 so that l⩾1: if τ=O(E∪F) is a stratum of co-dimension l′=l−1 such that Θτ,σ(x∗;μ)=∅, then F contains only one axis. By taking the Borel set B=R(E)×O(F), we see that P(η∈R(E)×O(F))=1/2 since the corresponding Zτ is a normal random variable in Rm−l+1 with mean zero. Hence, there are at most two strata of co-dimension l(μ) co-bounding σ on which infinitely many \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} lie. Moreover, in the case of there being only one such a stratum, P(η∈σ)=1/2 and, in case of two such strata, P(η∈σ)=0.
Case that 0⩽l(μ)<l, that there is a single τ0=O(E∪F0) such that the co-dimension of τ0 is l(μ) and that Θτ0,σ(x∗;μ)=Sτ0∖σl−l(μ): in this case, we have the following full description of the distribution of η in terms of ϕτ0, the probability density function of the random variable Zτ0 defined prior to Theorem 4. We first note that, since Kμ defined by (47) is convex and closed, the result of Proposition 9 implies that, in this case,
[TABLE]
Then, we extend the projection map P to R(E∪F0) in an obvious fashion and, for any τ=O(E∪F), where F⊆F0, and any z∈R(E∪F0), write zτ=Pτ(z) and zτ0∖τ=Pτ0∖τ(z)=z−zτ.
Proposition 11**.**
Under the above assumptions and notation, the limiting distribution of \sqrt{n}\{\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}\} is given as follows: for any τ=O(E∪F), where F⊆F0, and any Borel subset B⊆R(E)×O(F),
[TABLE]
where
[TABLE]
The special case that l(μ)=l−1 of this Proposition, together with the comments in the previous two paragraphs, generalises the limiting distribution of \sqrt{n}\{\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}-\mathop{\boldsymbol{x}}\nolimits^{*}\} when Xm is a tree space and x∗ lies in a stratum of co-dimension one obtained in [3].
Proof.
By Theorem 4, we only need to consider the case where F=F0. Assume that τ=O(E∪F) has co-dimension l′ and fix wτ∈Sτ∖σl−l′. We first show that
[TABLE]
Recall, from the proof of Theorem 2, that the geodesics from x∗(λ,wτ0)=x∗+λwτ0 to x have the same support for all wτ0∈Sτ0∖σl−l(μ) sufficiently close to wτ and all sufficiently small λ>0. For such wτ0 and λ, by Definition 13, Corollary 4(i), Propositions 8 and 9, the sequence A=(A0,⋯,Ak) in the support (A,B) of the geodesics from x∗(λ,wτ0)=x∗+λwτ0 to ξ1 has the property that, if i>0 and if Ai∩E=∅, then Ai consists of a single axis in F0 a.s., so that the PAi(x∗(λ,wτ0))/∥PAi(x∗(λ,wτ0))∥ is independent of the value of λ a.s. This, together with the fact implied by Corollary 4(ii) that, if Ai∩E=∅, PAi(x∗(λ,wτ0))→PAi(x∗) as λ→0, shows that, with probability one, each PAi(x∗(λ,wτ0))/∥PAi(x∗(λ,wτ0))∥ in the expression (11) for Φτ0(ξ1;x∗+λwτ0) is a continuous function at x∗ in the corresponding Euclidean space. It follows that
[TABLE]
exists a.s. and so, in particular,
[TABLE]
Thus, the definition of Ψ gives
[TABLE]
Since the limit on the right hand side exists, to find it, we take a particular path for wτ0 to approach wτ: wτ0=sinαw⊥+cosαwτ, where ⟨w⊥,wτ⟩=0 and ∥w⊥∥=1. Then, writing β=λsinα, we have
[TABLE]
where the second equality follows from Corollary 3(ii). Hence, it follows from Corollary 4(ii) that
Since \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} will lie in Kμ for sufficiently large n a.s. by Corollary 7, without loss of generality, we assume that it is true for all n. Let \hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\tau}_{n} denote the sample Euclidean mean of Ψ~τ(ξ1;x∗),⋯,Ψ~τ(ξ1;x∗). Then, \hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\tau}_{n}\in\mathbb{R}(E\cup F) and, by Corollary 6, \hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\tau}_{n}\rightarrow\mathop{\boldsymbol{x}}\nolimits^{*} a.s. Also, application of (49) gives P_{\tau\setminus\sigma}\left(\hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\tau}_{n}\right)=P_{\tau\setminus\sigma}\left(\hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\tau_{0}}_{n}\right). On the other hand, the argument for the proof of Theorem 4 implies that, for all sufficiently large n,
[TABLE]
so that, for all sufficiently large n,
[TABLE]
However, given that \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n} is in Kμ, since P_{\sigma}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n})=P_{\sigma}(\hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\tau_{0}}) by Corollary 4(ii) and Corollary 6, the fact that \hat{\mathop{\boldsymbol{\xi}}}\nolimits_{n}\in\tau is equivalent to the fact that P_{\tau\setminus\sigma}\left(\hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\tau_{0}}_{n}\right) lies in O(F) and -{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}P_{\tau_{0}\setminus\tau}\left(\hat{\mathop{\boldsymbol{\xi}}}\nolimits^{\tau_{0}}_{n}\right)}\in\mathop{\mathcal{O}}\nolimits(F_{0}\setminus F). Hence, we can re-express the above equality as
[TABLE]
The required result then follows by a slight modification to the proof of Theorem 4.
∎
In fact, the argument for the proof of Proposition 11, in particular (49), also shows that, if τ=O(E∪F) has co-dimension greater than l(μ) and if R×Θτ,σ(x∗;μ) is contained in the interior of Kμ, then P(η∈R×O(F))=0.
Aknowledgements. The authors are indebted to Megan Owen for her continuing helpful discussions, following her collaboration in [2] and [3]. We are indebted to the referees for helpful suggestions to improve the description and presentation of our results. The second author acknowledges funding from the Engineering and Physical Sciences Research Council.
Bibliography23
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] M. Bacak (2014). Computing medians and means in Hadamard spaces, SIAM J. Optimiz. 24 , 1542-1566.
2[2] D. Barden, H. Le and M. Owen (2013). Central limit theorems for Fréchet means in the space of phylogenetic trees, Electron. J. Probab. 18 , no. 25.
3[3] D. Barden, H. Le and M. Owen (2016). Limiting behaviour of Fréchet means in the space of phylogenetic trees. To appear in Annals of the Institute of Statistical Mathematics .
4[4] R. Bhattacharya and V. Patrangenaru (2005). Large sample theory of intrinsic and extrinsic sample means on manifolds-II, Ann. Statist. 33 , 1225-1259.
5[5] L.J. Billera, S.P. Holmes and K. Vogtmann (2001). Geometry of the space of phylogenetic trees, Advances in Applied Mathematics 27 , 733-767.
6[6] M.R. Bridson and A. Haefliger (1999). Metric Spaces of Non-positive Curvature . Springer-Verlag, Berlin/New York.
7[7] B. Burago, Y. Burago and S. Ivanov (2001). A Course in Metric Geometry . American Mathematical Society.
8[8] M. Goresky and R. Mac Pherson (1980). Stratified Morse Theory . Springer-Verlag, Berlin/New York.