Intrinsic minimum average variance estimation for sufficient dimension reduction with symmetric positive definite matrices and beyond
B. Chen, S. Dai, Z. Yu

TL;DR
This paper introduces intrinsic methods for sufficient dimension reduction with symmetric positive definite matrices, leveraging Riemannian geometry to improve estimation accuracy and extend applicability.
Contribution
It develops the intrinsic minimum average variance estimation and outer product gradient methods using Riemannian metrics, extending to general manifolds and providing rigorous theoretical guarantees.
Findings
Methods outperform existing techniques in simulations.
Algorithms effectively estimate structural dimension with theoretical support.
Application to taxi network data demonstrates practical utility.
Abstract
In this paper, we target the problem of sufficient dimension reduction with symmetric positive definite matrices valued responses. We propose the intrinsic minimum average variance estimation method and the intrinsic outer product gradient method which fully exploit the geometric structure of the Riemannian manifold where responses lie. We present the algorithms for our newly developed methods under the log-Euclidean metric and the log-Cholesky metric. Each of the two metrics is linked to an abelian Lie group structure that transforms our model defined on a manifold into a Euclidean one. The proposed methods are then further extended to general Riemannian manifolds. We establish rigourous asymptotic results for the proposed estimators, including the rate of convergence and the asymptotic normality. We also develop a cross validation algorithm for the estimation of the structural…
| Model | WIRE | eu-iOPG | eu-iMAVE | ch-iOPG | ch-iMAVE | fOPG | fMAVE | |
|---|---|---|---|---|---|---|---|---|
| I-1 | (10,100) | 0.0869 | 0.0693 | 0.0693 | 0.0616 | 0.0612 | 0.0891 | 0.3913 |
| 0.0229 | 0.0170 | 0.0169 | 0.0185 | 0.0184 | 0.0255 | 0.2300 | ||
| (10,200) | 0.0617 | 0.0489 | 0.0488 | 0.0421 | 0.0422 | 0.0592 | 0.3803 | |
| 0.0163 | 0.0103 | 0.0100 | 0.0091 | 0.0092 | 0.0144 | 0.2093 | ||
| (20,100) | 0.1406 | 0.1118 | 0.1112 | 0.0973 | 0.0965 | 0.1443 | 0.3519 | |
| 0.0242 | 0.0193 | 0.0194 | 0.0184 | 0.0183 | 0.0322 | 0.1769 | ||
| (20,200) | 0.0953 | 0.0735 | 0.0735 | 0.0656 | 0.0654 | 0.0934 | 0.2748 | |
| 0.0167 | 0.0146 | 0.0146 | 0.0107 | 0.0107 | 0.0192 | 0.1483 | ||
| I-2 | (10,100) | 0.0577 | 0.0635 | 0.0605 | 0.0530 | 0.0532 | 0.2802 | 2.8665 |
| 0.0317 | 0.0281 | 0.0321 | 0.0226 | 0.0249 | 0.4609 | 0.2536 | ||
| (10,200) | 0.0277 | 0.0283 | 0.0277 | 0.0246 | 0.0248 | 0.0504 | 2.9728 | |
| 0.0229 | 0.0227 | 0.0222 | 0.0223 | 0.0215 | 0.0302 | 0.0854 | ||
| (20,100) | 0.1314 | 0.1656 | 0.1376 | 0.1192 | 0.1155 | 1.3775 | 3.0551 | |
| 0.0344 | 0.0660 | 0.0464 | 0.0324 | 0.0327 | 0.5156 | 0.3843 | ||
| (20,200) | 0.0578 | 0.0582 | 0.0560 | 0.0525 | 0.0517 | 0.2223 | 2.9806 | |
| 0.0128 | 0.183 | 0.0144 | 0.0148 | 0.0136 | 0.1651 | 0.1680 |
| Model | WIRE | eu-iOPG | eu-iMAVE | ch-iOPG | ch-iMAVE | fOPG | fMAVE | |
|---|---|---|---|---|---|---|---|---|
| II-1 | (5,100) | 1.2928 | 0.0872 | 0.0818 | 0.0871 | 0.0832 | 1.2666 | 1.2456 |
| 0.1478 | 0.2090 | 0.2084 | 0.2108 | 0.2164 | 0.1962 | 0.2002 | ||
| (5,200) | 1.2308 | 0.0280 | 0.0254 | 0.0291 | 0.0260 | 1.2124 | 1.2240 | |
| 0.1965 | 0.0112 | 0.0099 | 0.0114 | 0.0099 | 0.2326 | 0.2237 | ||
| (10,100) | 1.3491 | 0.7189 | 0.6827 | 0.6925 | 0.6789 | 1.3413 | 1.3400 | |
| 0.0728 | 0.6296 | 0.6479 | 0.6276 | 0.6438 | 0.0790 | 0.0791 | ||
| (10,200) | 1.3367 | 0.1641 | 0.1490 | 0.1500 | 0.1461 | 1.3320 | 1.3385 | |
| 0.1073 | 0.3779 | 0.3709 | 0.3556 | 0.3610 | 0.1062 | 0.0992 | ||
| II-2 | (5,100) | 1.2118 | 0.0604 | 0.0554 | 0.0648 | 0.0594 | 1.2912 | 1.5003 |
| 0.2560 | 0.0195 | 0.0169 | 0.0195 | 0.0176 | 0.2745 | 0.1855 | ||
| (5,200) | 1.1923 | 0.0338 | 0.0331 | 0.0360 | 0.0352 | 1.2266 | 1.4979 | |
| 0.2578 | 0.0093 | 0.0092 | 0.0099 | 0.0099 | 0.2354 | 0.1562 | ||
| (10,100) | 1.3954 | 0.3847 | 0.3651 | 0.3426 | 0.3246 | 1.6178 | 1.7211 | |
| 0.1070 | 0.5302 | 0.5309 | 0.4995 | 0.5047 | 0.1532 | 0.1396 | ||
| (10,200) | 1.3714 | 0.0637 | 0.0566 | 0.0675 | 0.0603 | 1.4808 | 1.7123 | |
| 0.1005 | 0.0121 | 0.0104 | 0.0126 | 0.0108 | 0.1526 | 0.1334 |
| Model | WIRE | iOPG | iMAVE | fOPG | fMAVE | |
|---|---|---|---|---|---|---|
| III | (10,100) | 0.3461 | 0.2555 | 0.2226 | 0.6743 | 1.5332 |
| 0.0803 | 0.0770 | 0.0643 | 0.2456 | 0.1610 | ||
| (10,200) | 0.2270 | 0.1545 | 0.1475 | 0.4065 | 1.5104 | |
| 0.0505 | 0.0372 | 0.0358 | 0.1644 | 0.0372 | ||
| (20,100) | 0.5395 | 0.4766 | 0.3534 | 1.1215 | 1.6307 | |
| 0.1012 | 0.0967 | 0.0699 | 0.2209 | 0.1392 | ||
| (20,200) | 0.3474 | 0.2481 | 0.2172 | 0.6990 | 1.6083 | |
| 0.0567 | 0.0409 | 0.0401 | 0.1657 | 0.1510 |
| Direction | Ave.Distance | Ave.Fare | Ave.Passengers | Ave.Tip | Cash | Credit | Dispute |
|---|---|---|---|---|---|---|---|
| 0.2417 | -0.4827 | 0.0927 | -0.0313 | -0.5720 | 0.5863 | 0.0074 | |
| 0.6592 | -0.4002 | 0.2242 | 0.0878 | 0.5755 | -0.0817 | -0.0101 | |
| -0.3931 | -0.6700 | -0.4348 | 0.0017 | 0.1952 | -0.0343 | -0.0577 | |
| Free | LateHour | Ave.Temp | Ave.Humid | Ave.Wind | Ave.Press | Precip | |
| 0.1277 | 0.0988 | -0.0339 | -0.0053 | -0.0025 | -0.2579 | -0.0033 | |
| -0.0174 | 0.0579 | -0.0511 | -0.0075 | -0.0062 | -0.0540 | 0.0064 | |
| -0.1134 | -0.3692 | 0.0789 | -0.0030 | -0.0163 | 0.0833 | 0.0470 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBone and Joint Diseases · Statistical Methods and Inference · Point processes and geometric inequalities
Intrinsic Minimum Average Variance Estimation for Sufficient Dimension Reduction with Symmetric Positive
Definite Matrices and Beyond
Baiyu Chen
School of Statistics, East China Normal University
and
Shuang Dai
School of Statistics, East China Normal University
and
Zhou Yu
School of Statistics, East China Normal University
Abstract
In this paper, we target the problem of sufficient dimension reduction with symmetric positive definite matrices valued responses. We propose the intrinsic minimum average variance estimation method and the intrinsic outer product gradient method which fully exploit the geometric structure of the Riemannian manifold where responses lie. We present the algorithms for our newly developed methods under the log-Euclidean metric and the log-Cholesky metric. Each of the two metrics is linked to an abelian Lie group structure that transforms our model defined on a manifold into a Euclidean one. The proposed methods are then further extended to general Riemannian manifolds. We establish rigourous asymptotic results for the proposed estimators, including the rate of convergence and the asymptotic normality. We also develop a cross validation algorithm for the estimation of the structural dimension with theoretical guarantee Comprehensive simulation studies and an application to the New York taxi network data are performed to show the superiority of the proposed methods.
Keywords: Sufficient dimension reduction; Sliced inverse regression; Minimum average variance estimation; Outer product gradient; Symmetric positive definite matrix.
1 Introduction
Undergoing accelerated developments for more than 20 years, sufficient dimension reduction (SDR) has now become a powerful tool in statistics, partly thanks to the increasing demand for techniques to deal with high-dimensional circumstances. Multiple classes of SDR methods have evolved themselves to be well-established for high-dimensional data analysis with Euclidean responses and predictors. Typical SDR tools include the inverse regression estimation methods (e.g., Sliced Inverse Regression (Li, 1991), Sliced Average Variance Estimation (Cook and Weisberg, 1991) and directional regression (Li and Wang, 2007)), the nonparametric method like the outer product of gradients (OPG) and the minimum average variance estimation (MAVE) method (Xia et al., 2002), and the semiparametric approach (Ma and Zhu, 2012, 2013, 2019).
However, the prosperity of big data is accomplished by the abundance of non-Euclidean objects where traditional dimension reduction methods fail. For example, in an Alzheimer’s Disease Neuroimaging Initiative (ANDI) study (Lin et al., 2022), subjects were invited to a medical center to get their brain images and assessment of their behavior abilities. Then a preprocessing protocol is applied to turn brain images into the average hippocampal diffusion tensors which are symmetric positive definite (SPD) matrices characterizing diffusion of water molecules in tissues and conveying rich information about brain tissues. Finally researchers are faced with a data set where the response is a SPD matrix and are predictors standardized to the interval representing the scores of each subject’s memory, executive functioning, language ability and so on. Another example arises from the taxi services within a city. Researchers divide the city into several zones and take these zones as nodes in a network or graph. This graph is further weighted by the number of taxi pick-ups and drop-offs between zones in a time interval. Proper transformations can turn these graphs into SPD matrices describing the taxi movements in a city. After collecting potential predictors such as travel distance, fare amount, average daily temperature and total precipitation, one can analyze the relationship between the taxi movements and possible factors.
In above examples, responses are non-Euclidean and lie in which stands for a manifold consisting of SPD matrices. When the dimension of prediction variable is large, sufficient dimension reduction is necessary to avoid the curse of dimensionality but unfortunately, traditional Euclidean methods cannot work for responses being SPD matrices. As a consequence, there has been a growing need to carry out SDR with SPD matrices as responses.
Up to now there have been many works where traditional statistical methods in Euclidean spaces are generalized to manifolds or more general metric spaces such as local polynomial regression for SPD matrices (Yuan et al., 2012; Zhu et al., 2009; Cornea et al., 2016), Fréchet regression for random objects (Peterson and Müller, 2019a), intrinsic Riemannian functional principal component analysis and functional linear regression (Lin and Yao, 2019), additive model for SPD matrices (Lin et al., 2022), Fréchet sufficient dimension reduction for random objects (Ying and Yu, 2022; Zhang et al., 2021), intrinsic Wasserstein correlation analysis (Zhou et al., 2021), single index Fréchet regression (Bhattacharjee and Müller, 2021), autoregressive optimal transport model (Zhu and Müller, 2021) and so on.
Among these works, two recent papers are related to non-Euclidean SDR. Ying and Yu (2022) extended the traditional SIR model to the case where the predictors are Euclidean while the response takes values in a metric space. They borrowed strength from the martingale difference divergence to avoid the estimation of and to absorb information in by including the distance function in the metric space. The work of Zhang et al. (2021) turned almost all existing Euclidean SDR methods into ones for Euclidean and metric space-valued , which is very comprehensive and flexible.
In their proposal, the random object is first mapped into a real-valued random variable and then classic SDR methods can be applied to the transformed data. However, when the response lie in a manifold, even though the two methods aforementioned can be performed, they fail to fully exploit the intrinsic geometry of the manifold and thus some information contained in the response is inevitably lost.
In this paper, we consider the dimension reduction of the conditional mean (Cook and Li, 2002) with SPD matrices. The basic problem is to find a lower dimensional predictor such that
[TABLE]
where and is a matrix. To fully incorporate the information in the -valued response, we generalize the state-of-the-art sufficient mean dimension reduction method MAVE and OPG for the estimation of the column space spanned by . The basic idea of our method also stems from the local polynomial regression (ILPR) for SPD matrices introduced by Yuan et al. (2012), which replaced the square distance by the geodesic distance on and performed Taylor expansion after parallel transport to estimate an intrinsic conditional expectation of an SPD matrix response, given a covariate vector . Yuan et al. (2012) only considered the case where is a scalar. We in this paper take a step forward to handle the high-dimensional . We call our method intrinsic MAVE and intrinsic OPG since cannot be isometrically embedded into a Euclidean space and we deal with it in a totally intrinsic way.
The rest of this paper is organized as follows. Some preliminaries on manifolds are introduced in Section 2. Then we introduce our intrinsic dimension reduction proposals and algorithms with SPD matrices in Section 3 and Section 4. Our proposed methods for SPD matrices are extended to general manifolds in Section 5. Asymptotic results, including the rate of convergence and asymptotic normality are established in Section 6. A cross validation procedure to determine the structural dimension is presented in Section 7. Simulation studies are illustrated in Section 8 and a real data application is carried out in Section 9. Section 10 concludes this paper. Additional simulation results and proofs for theorems can be found in the supplementary material.
2 Preliminaries on Manifolds
We first introduce some basic notions for Riemannian manifolds and Lie groups (Tu, 2011; Lang, 1999). Let be a simply connected and smooth manifold and . For a small scalar , let be a continuously differential map from to passing through . A tangent vector at is the derivative of the curve at . All such tangent vectors at form a vector space named the tangent space at , which is denoted by . Each tangent space can be endowed with an inner product that varies smoothly with . The inner products are collectively denoted by , which is referred to as the Riemannian metric of . With a Riemannian metric, we can define a distance on that turns into a metric space. The length of a continuously differentiable curve is calculated as , where is the derivative of . And is the infimum of the length over all continuously differentiable curves joining and .
A geodesic is a curve defined on such that for each , is the shortest path connecting and for sufficiently small . The Riemannian exponential map at is a function mapping into and defined by with and . The inverse of , if exists, denoted by and called the Riemannian logarithm map at , can be defined as for such that .
A vector field is a function defined on such that . Given a curve on , for a real interval , a vector field along is a smooth map defined on such that . We say is parallel along if for all where is the Levi-Civita connection on . In this paper we only focus on parallel vector fields along geodesics. Let be a geodesic connecting and , and is a parallel vector field along such that and . Then the parallel transport of along is denoted as .
When is a group and the group operation and its inverse are both smooth, is called a Lie group. The tangent space at the identity element is called a Lie algebra denoted by . It consists of left-invariant vector fields which satisfies , where is the left translation at and is the differential of . A Riemannian metric is called left-invariant if for all and . Right invariance can be defined similarly. A metric is bi-invariant if it is both left-invariant and right-invariant. The Lie exponential map, denoted by is defined by where is the unique one-parameter subgroup such that . Its inverse, if exists, is denoted by . Please make a distinction between the Riemannian exponential map “”, the Lie exponential map “” and the common matrix exponential operation “” which appear frequently in later sections. When is bi-invariant, then coincides with .
3 Intrinsic MAVE and OPG for SPD Matrices
The classic MAVE in a Euclidean space adopts the following regression-type model for conditional mean dimension reduction:
[TABLE]
where and are respectively -valued and -valued random variables, is an unknown smooth link function, is a orthogonal matrix () with and almost surely. MAVE aims to estimate as captures all information about provided by .
MAVE targets by solving
[TABLE]
which is equivalent to
[TABLE]
Suppose is a sample from . Following the idea of local linear regression, the above formula can be approximated by
[TABLE]
which can be further approximated by
[TABLE]
where and for , with being the kernel function and being the bandwidth. Optimizing (4) gives the estimation of .
When it comes to the manifold case where but , model (2) should be modified. In this case, is a link function and . In order to ensure that , we assume a group structure on with the group operator , and replace by . To make our model more flexible, we further assume that is a commutative group (abelian group).
Let be an abelian group endowed with a Riemannian metric . Let be the link function and be the random noise whose Fréchet mean corresponds to the group identity element. Then conditional mean sufficient dimension reduction with but can be formulated as
[TABLE]
We first figure out the definition of conditional expectation when is an SPD matrix. According to Yuan et al. (2012), the intrinsic conditional expectation of at is defined as a SPD matrix such that
[TABLE]
where is an matrix with all elements 0 and the expectation is taken in a component-wise way. From now on we use instead of .
Starting from (3), we replace the square distance by the squared geodesic distance on the manifold and rewrite (3) as
[TABLE]
Next we want to similarly expand at . However, is in the curved space and Taylor expansion is infeasible. Instead we apply the Riemannian logarithm map to transform to . Since for different are in different tangent spaces, these tangent vectors are transported from to a same tangent space using parallel transport given by:
[TABLE]
Thus is a function in a vector space and can be expanded at using Taylor series expansion. Considering is an symmetric matrix and is a vector, we differentiate each component of with respect to and this leads to
[TABLE]
which gives
[TABLE]
where only up to first order approximation is considered, is the inverse map of and is the Kronecker product.
In (7), both and are parameters to estimate: serving as the 0-order approximation in Taylor expansion, being the derivative matrix in the first order term and possessing the structure
[TABLE]
where . The in parentheses indicate that is related to . We use to denote for simplicity here and hereafter.
Now we introduce three operators in matrix algebra. “” is the common matrix vec operator that vectorize an matrix by column into an vector. For an symmetric matrix , define . That is, “” vectorize the lower triangle part of a symmetric matrix by row. For in (8), define . We will frequently use , and in subsequent sections.
Finally combining (7) with (6), we arrive at what we call the intrinsic MAVE method (iMAVE):
[TABLE]
where is and is .
The only difference between the classic OPG and the classic MAVE is the absence of in the former. So the intrinsic OPG method (iOPG) can be formulated immediately as
[TABLE]
where the size of here is .
Only the Riemannian metric needs specifying to solve (9) and (10). Actually we do not require (5) to be true since the procedure of deriving iMAVE and iOPG has nothing to do with the group structure on . We only assume (5) when performing the theoretical analysis. Thus (9) and (10) are flexible SDR methods but the choice of the metric affects the complexity of optimization.
4 Algorithms under the Log-Euclidean Metric
The log-Euclidean metric is proposed by Arsigny et al. (2007). The key observation is that: is diffeomorphic to its tangent space at the identity matrix, . To be specific, and its inverse are both smooth and they are diffeomorphisms.
Let . Define an operation by
[TABLE]
Then is an abelian Lie group. The identity element is the identity matrix. Moreover, the Lie group exponential map is just the matrix exponential . That is, the matrix logarithm maps every SPD matrix in to the tangent space . Based on this fact, we may get the expression of iMAVE under the log-Euclidean metric in a simpler way.
We start from (6). Under the log-Euclidean metric, the geodesic distance . Here is the Frobenius norm. So
[TABLE]
Since and are both in , no parallel transportation is needed. Directly expand at , we get iMAVE under the log-Euclidean metric:
[TABLE]
and similarly iOPG under the log-Euclidean metric:
[TABLE]
Models (12) and (13) are optimized similarly to Xia et al. (2002) or Xia (2007). The main difference here is differentiating a matrix-valued function w.r.t. a vector. We in the following sketch out the algorithms. First some notations are introduced.
Write and let
[TABLE]
where are from (8).
Now we are ready for the algorithms.
Similarly the algorithm of iOPG is shown in Algorithm 2. Usually the result of OPG can be used as the initial value of in MAVE.
When is endowed the log-Cholesky metric, methods can be derived similarly as the key point is under the log-Cholesky metric, the geodesic distance between is . Here are Cholesky factors of (Lin, 2019) and where is the strict lower triangle part of and the diagonal part of . For any and its Cholesky factor , lies in a fixed vector space. Substituting in the log-Euclidean case for and keeping other things unchanged, we get iMAVE, iOPG under the log-Cholesky metric and details are omitted.
5 Extension to General Riemannian Manifolds
According to the lemma S1 in Lin (2022), if (,) is an abelian Lie group endowed with a bi-invariant metric that turns into a Hadamard manifold, for any , , . Here is the identity element of the group. Endowing with the log-Euclidean metric or the log-Cholesky metric can meet the conditions. So applying above equations to which is equivalent to our model (5) with denoting the Fréchet mean of , we have . This model can be rewritten as
[TABLE]
where and . The model (14) is completely a Euclidean one since is a vector-valued function defined in , which brings convenience for the theoretical analysis of iMAVE and iOPG. However if the chosen metric cannot turn into an abelian group with a bi-invariant metric, neither (5) nor (14) holds. In this case, we can directly assume model (14) for other metrics and furthermore general Riemannian manifolds.
Let and where is a general Riemannian manifold. We assume the relationship between and can be described by (14). We still aim at estimating and the estimating procedure is just the MAVE and OPG with multivariate response developed by Zhang (2021), which can also be derived by slightly modifying our proposed algorithms in Section 4.
For general Riemannian manifolds whose sectional curvature is positive, the Fréchet mean may not exist and therefore additional conditions are needed for (14). We assume
- (A1)
The minimizer of the Fréchet function exists and is unique.
This is automatically satisfied when is equipped with either the log-Euclidean metric or the log-Cholesky metric.
For a subset of , denotes the set where is the ball with center and radius in . We use to denote the set . In order to define at least with a dominant probability for a large sample, we assume
- (A2)
There is some constant such that =1.
The condition (A2) is only needed when is not a Hadamard manifold. If (A1) and (A2) are satisfied, (14) is well defined.
6 Asymptotic Results
We first establish the consistency and asymptotic normality of the iMAVE and iOPG estimators under the general manifolds case in model (14) and the results of endowed with either the log-Euclidean metric or the log-Cholesky metric is given as corollaries. We consider a manifold that satisfied one of the following conditions:
- (M1)
is a finite-dimensional Hadamard manifold having sectional curvature bounded from below by .
- (M2)
is a complete compact Riemannian manifold.
An example satisfying (M1) is endowed with the log-Euclidean metric, the log-Cholesky metric or the affine-invariant metric while the unit sphere serves as an example satisfying (M2).
We have to treat during our proof where is short for . The method in Lin and Yao (2019) is applied here to write as and the asymptotic normality of helps us control the discrepancy between and . Above and , acting on vector fields by . Here is a vector field with and “” denotes the Hessian matrix (Kendall and Le, 2011). To make above reasoning valid, following conditions are needed.
- (A3)
satisfies at least one of the conditions (M1) and (M2).
- (A4)
For all , .
- (A5)
For some constant , when is sufficiently small.
- (A6)
where is the smallest eigenvalue of an operator or a matrix.
Conditions (A3)-(A6) are standard assumptions also made by Lin (2022), Kedall and Le (2011) and Lin and Yao (2019). (A4) is analogous to the moment condition in the Euclidean case. (A5) is satisfied for Hadamard manifolds with according to the lemma S.7 of Lin and Müller (2021). (A6) is made to ensure is invertible.
We need additional conditions that are standard in the literature on MAVE and OPG methods such as Xia et al. (2002) and Xia (2007).
Some notations are listed here. Suppose the dimension of is and thus terms in (14) are -dimensional vectors. Let () denote the th component of and are defined similarly. Let , , , and , which will be frequently encountered in proofs. For any square matrix , and denote the inverse (if it exists) and the Moore-Penrose inverse matrix.
- (B1)
For , has bounded, continuous third derivatives and .
- (B2)
The density function of has bounded second order derivatives on and is bounded away from 0 in a neighborhood around 0; for some ; the functions and have bounded derivatives with respect to and for for some .
- (B3)
For every component () in , the density function has bounded derivative and is bounded away from 0 on a compact support; the conditional density functions and have bounded fourth order derivatives w.r.t. and for in a neighborhood of .
- (B4)
The matrix has full rank , where is the derivative matrix of .
- (B5)
is a symmetric univariate density function with bounded second order derivatives. All the moments of exist.
- (B6)
Bandwidths where , . For , where with , and are constants.
Define
[TABLE]
Theorem 6.1**.**
Under (A1)-(A6) and (B1)-(B6), the estimated from (14) satisfies
[TABLE]
in probability as , where . If , then
[TABLE]
Results in Theorem 6.1 are consistent with those in Xia et al (2002), Xia (2007) and Zhang (2021). The iMAVE shares the merit of classic MAVE that it can achieve a faster consistency rate even without undersmoothing the nonparametric link function estimator. Similar results of iOPG are shown below.
Theorem 6.2**.**
Under (A1)-(A6) and (B1)-(B6), the estimated from (14) satisfies
[TABLE]
in probability as , where . If , then
[TABLE]
When is endowed with the log-Euclidean metric or the log-Cholesky metric, the manifold-related conditions are automatically satisfied and thus only (B1)-(B6) are needed. We present theoretical results of iMAVE and iOPG with lying in endowed with the log-Euclidean metric in (12) and (13) by the following corollaries. The log-Cholesky case is almost the same and is omitted.
In this case, with defined in (11) is an abelian Lie group and the bi-invariant log-Euclidean metric turns into a Hadamard manifold. Our model (5) is valid and can be transformed into
[TABLE]
by the same reasoning in Section 5 (with replaced by ). We denote and . Terms in (15) are symmetric matrices and if we vectorize the lower triangle part of these matrices into -dimensional vectors, then (15) coincides with (14). Thus the main difference of the case is that () in (B1)-(B6) should be replaced by () and in (B4) should be .
Define
[TABLE]
Corollary 6.3**.**
Under(B1)-(B6), the estimated from (12) satisfies
[TABLE]
in probability as , where . If , then
[TABLE]
Corollary 6.4**.**
Under(B1)-(B6), the estimated from (13) satisfies
[TABLE]
in probability as , where . If , then
[TABLE]
In the proof of Corollary 6.3 and Corollary 6.4, we would not encounter . Actually even in the general manifold case does not have effects on the convergence rate and the asymptotic variance. As shown above, convergence rates in the general manifold case and the case are the same and asymptotic variances are consistent in form.
7 Determine the Structural Dimension
In this part, we discuss how to use a cross validation procedure to determine the structural dimension. We focus on the case and the method can be extended to general manifold similarly. Suppose is now the working dimension and is the true structural dimension. In the Euclidean case, Xia et al. (2002) defined
[TABLE]
where are scalars, and the suffix is used to indicate that the bandwidth depends on the working dimension . Actually is the N-W estimate of . And the CV value is
[TABLE]
In our case, are now SPD matrices. If we equip with the log-Euclidean metric, then are in . Similarly define
[TABLE]
where is the matrix Frobenius norm. We then estimate as
[TABLE]
Theorem 7.1**.**
Suppose assumptions (B1)-(B3) and (B5) hold. We have
[TABLE]
Theorem 7.1 shows that as , the probability of choosing the right dimension tends to 1. If we equip with the log-Cholesky metric, above arguments still hold by replacing with .
8 Simulation Studies
8.1 Study I for SPD Matrices
In the following studies the structural dimension is known unless otherwise specified. We test the performance of our proposed iMAVE with log-Euclidean metric (eu-iMAVE), iOPG with log-Euclidean metric (eu-iOPG), iMAVE with log-Cholesky metric (ch-iMAVE), iOPG with log-Cholesky metric (ch-iOPG), weighted inverse regression ensemble method (WIRE, Ying and Yu (2022)), Fréchet MAVE and Fréchet OPG (fMAVE and fOPG, Zhang et al. (2021)).
According to Schwartzman (2006), is said to obey the standard symmetric matrix variate Normal distribution if has independent diagonal elements and independent off-diagonal elements. is said to obey the symmetric matrix variate Normal distribution if where and . As a special case, we say if .
Let , . The predictors are independent random variables each from the uniform distribution on . We generate i.i.d samples . Let be matrices specified by the following models:
I-1: M(X)=\left(\begin{array}[]{cc}1&\rho(X)\\ \rho(X)&1\end{array}\right), where ;
I-2: M(X)=\left(\begin{array}[]{ccccc}1&\rho_{1}(X)&\rho_{1}(X)&\rho_{2}(X)&\rho_{2}(X)\\ \rho_{1}(X)&1&\rho_{2}(X)&\rho_{2}(X)&\rho_{2}(X)\\ \rho_{1}(X)&\rho_{2}(X)&1&\rho_{2}(X)&\rho_{1}(X)\\ \rho_{2}(X)&\rho_{2}(X)&\rho_{2}(X)&1&\rho_{1}(X)\\ \rho_{2}(X)&\rho_{2}(X)&\rho_{1}(X)&\rho_{1}(X)&1\end{array}\right),
where and .
We generate . That is, . In model I-1, , and . In model I-2, , and . In above settings is not necessarily the Fréchet mean of given , but still measures the concentration tendency of the conditional distribution . Model I-1,I-2 are also considered in Zhang et al. (2021). The kernel function in iOPG and iMAVE is . In WIRE, we adopt the distance function induced by the log-Euclidean metric to compute the distance matrix. We follow the same steps described in Zhang et al. (2021) to prepare fOPG and fMAVE for the following simulations. For each model, we take and . The experiments in each scenario was repeated 100 times and the means and standard deviations of the estimation errors are listed in Table 1. The results for are presented in the supplementary material.
It is obvious that the best performer is always iOPG or iMAVE with either the log-Euclidean or the log-Cholesky metric. This result is reasonable since WIRE and fOPG, fMAVE make use of the information hidden in by calculating the distance matrix or the kernel matrix , both of which fail to fully exploit the inner structure of . On the contrary, our methods are intrinsic and respect the geometric structure of , thus generating more satisfying results.
8.2 Study II for SPD Matrices
In this simulation study, we generate similar to Lin et al. (2022). Let the predictors be independently and identically sampled from the uniform distribution on . Fix to be the identity matrix. Set , where with the following two settings for :
II-1: , where is an matrix with -entry being ;
II-2: where is an matrix with -entry being .
The setting II-2 is the manifold additive model proposed by Lin et al. (2022) and II-1 is a modification. We set . The random noise is generated according to , where are independently sampled form and are an basis of the tangent space . Note that is identical with so is just the identity map. We adopt the log-Euclidean metric so that and . In model II-1, and ; in model II-2, and , where and . We take . Following Wang et al. (2013) , we in this study adopt the multi-dimensional Gaussian kernel with the bandwidth set to be and being the dimension of . The means and standard deviations of the estimation errors are summarized in Table 2.
Model II-1, II-2 are tough tasks, in each scenario all methods except ours fail to give reasonable estimates even when the dimension is not large at all. Our methods can give accurate estimates on most occasions. When the dimension is relatively large () and the sample size is not large enough (), our methods cannot always produce satisfying estimates and may fail. In Fig. 1, we draw the box plots of the estimation errors based on 100 replications of all methods for in II-1 and II-2. First we can see that WIRE, fOPG and fMAVE fail in all scenarios. When the sample size is not large enough (), iOPG or iMAVE still has a possibility to fail even if the median of estimation errors is small and stable. See the top left box plot in Fig. 1. The case II-2 is easier than II-1 for our models, with much less wrong estimates (the bottom two plots). When the sample size increases to 200, our methods improve themselves and give accurate estimates in every replication in II-2, while no obvious improvement is observed for other methods. It can be expected for our methods to produce more accurate estimates if the sample size is large enough.
8.3 Study III for Sphere Data
Since the proposed iMAVE and iOPG can be extended to general manifolds, we in this part test the performance of models derived from model (14). We generate according to the following model:
III: Let and the tangent vector at be
[TABLE]
We generate i.i.d. observations from the uniform distribution on and i.i.d. . Then is generated by
[TABLE]
where is the Euclidean norm.
The simulation results under several scenarios are listed in Table 3. The proposed iMAVE and iOPG always perform better than others, with iMAVE producing the smallest estimation errors.
8.4 Study VI: Determine the Structural Dimension
In this part we assume that we have no knowledge about the dimension of the mean dimension reduction space and need to estimate it. We generate data from the five models in Study I, II and III and use the CV procedure to estimate . In the CV procedure, we use iOPG to estimate . We set , and repeat 100 times for each model and list the counts of correct and false estimates in 100 times when and , which is shown in Fig. 2.
Except model II-1 with , our CV procedure always gives satisfying estimations, reaching an accuracy greater than and even approaching in some cases. And if we increase the sample size to 300, the result corresponding to model II-1 with becomes: , , . Such improvement validates Theorem 7.1.
9 Application to New York Taxi Network Data
In this section, we apply our proposed methods to the New York Taxi network data. We first estimate the structural dimension as and apply iMAVE equipped with the log-Euclidean metric to derive estimated on the training dataset. Then we feed our results to the manifold additive regression model (Lin et al., 2022) and get the prediction root mean squared error (RMSE) on the testing dataset. Small RMSE will justify the validity of our methods.
The New York City Taxi and Limousine Commission provides records on pick-up and drop-off dates and times, pick-up and drop-off locations, trip distances, itemized fares, payment types and other information for yellow taxis (Tucker et al., 2021). The data are available from
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Similar to Tucker et al. (2021), we transform raw data into network data (adjacent matrices), where zones are nodes and edges are weighted by the number of taxi rides which picked up in one zone and dropped off in another within a single hour. After proper mapping, these adjacent matrices lie in the space of SPD matrices. We do the following to collect SPD matrices together with several prediction variables:
-
We only choose the data of January and February, 2019 (59 days) due to resource restrictions.
-
We filter on observations with both pick-up and drop-off occurring in Manhattan (islands excluded).
-
We then group zones in Manhattan into 3 zones and label them similar to Dubey and Müller (2020). That is, each network has 3 nodes.
-
For each hour, we collected the number of pairwise connections between nodes based on pick-ups and drop-offs. These correspond to weights between nodes. We then further normalize the weights by the maximum edge weight in each hour so that they lie in .
By doing so, we collected 1416 (5924) weighted adjacent matrices of describing the taxi movements between zones in Manhattan. To ensure that they are SPD matrices, we apply to these symmetric matrices.
From the dataset we collect the following 9 potential predictors, with values averaged over each hour:
Ave.Distance: mean distance traveled, standardized
Ave.Fare: mean total fare, standardized
Ave.Passengers: mean number of passengers, standardized
Ave.tip: mean tip, standardized
Cash: sum of cash indicators for type of payment, standardized
Credit: sum of credit indicators for type of payment, standardized
Dispute: sum of dispute indicators for type of payment, standardized
Free: sum of free indicators for type of payment, standardized
LateHour: indicator for the hour being between 11pm and 5am
We also collect New York City weather data for January and February 2019 from
https://www.wunderground.com/history/daily/us/ny/new-york-city/KLGA/date
The following 5 weather variables are included as potential predictors:
Ave.temp: daily mean temperature, standardized
Ave.humid: daily mean humidity, standardized
Ave.wind: daily mean wind speed, standardized
Ave.press: daily mean barometric pressure, standardized
Precip: daily total precipitation, standardized
This then yields a total of 14 potential predictors. We can now write the data at hand as , where is an array of dimension , , and is a SPD matrix . Then we randomly divide the dataset into a train set (991 samples) and a test dataset (425 samples). On the train set, we respectively set , apply iMAVE with the log-Euclidean metric and calculate CV(). The results are: 0.0430, 0.0283, 0.0257, 0.0626, 0.0834, 0.0687, 0.0612. The CV procedure suggests that is a reasonable choice. So we apply iMAVE with again to the training dataset and get which is listed in Table 4.
The estimated results show that fare amount and type of payment are important covariates, which is consistent with the results of Tucker et al. (2021). Ave.Fare and Ave.Distance are closely related and both of them are significant in the first three directions. Cash and Credit are significant in the first direction, showing that most passengers tend to pay the fare by cash or credit. Another obvious observation is that all the 5 weather variables seem negligible since their coefficients are almost 0 in all of the first three directions. This is reasonable because as a global metropolitan, the New York City has established an advanced and robust public transportation system. And mild weather changes may have little compact on the function of taxi services. The weather condition during January and February 2019 is rather stationary, which accounts for the insignificance of weather variables.
To show our dimension reduction method is valid and has further statistical applications, we conduct the additive regression using the manifold additive model (MAM) introduced by Lin et al. (2022). The MAM is formulated as
[TABLE]
where is an SPD matrix, is the Fréchet mean of , each is function mapping into the SPD space, is random noise which has a Fréchet mean corresponding to the group identity element, are scalar variables and is the group operation.
We apply MAM to the train dataset after dimension reduction to get estimated and functions , and . Then we apply the trained MAM to the test dataset to get the estimates . The prediction RMSE on the test dataset is 0.3220, which is a relative small number as the prediction error of a SPD. That is to say, MAM generates good estimation after processing data with our intrinsic dimension reduction method, which indicates that our method is valid and possesses the potential for widely applications.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Arsigny et al., (2007) Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. (2007). Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications , 29:328–347.
- 2Batchelor et al., (2004) Batchelor, P. G., Moakher, M., Atkinson, D., Calamante, F., and Connelly, A. (2004). A rigorous framework for diffusion tensor calculus. Magnetic Resonance in Medicine , 53:221–225.
- 3Bhattacharjee and Müller, (2021) Bhattacharjee, S. and Müller, H.-G. (2021). Single index Fréchet regression. ar Xiv:2108.05437 [stat.ME].
- 4Chen et al., (2020) Chen, Y., Lin, Z., and Müller, H.-G. (2020). Wasserstein regression. ar Xiv:2006.09660 [stat.ME].
- 5Cook and Li, (2002) Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression. The Annals of Statistics , 30:455–474.
- 6Cook and Weisberg, (1991) Cook, R. D. and Weisberg, S. (1991). Sliced inverse regression for dimension reduction: Comment. Journal of the American Statistical Association , 86:328–332.
- 7Cornea et al., (2016) Cornea, E., Zhu, H., Kim, P., and Ibrahim, J. G. (2016). Regression models on Riemannian symmetric spaces. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 79:463–482.
- 8Dubey and Müller, (2020) Dubey, P. and Müller, H.-G. (2020). Functional models for time-varying random objects. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 82:275–327.
