Intrinsic minimum average variance estimation for sufficient dimension   reduction with symmetric positive definite matrices and beyond

B. Chen; S. Dai; Z. Yu

arXiv:2302.13059·stat.ME·February 28, 2023

Intrinsic minimum average variance estimation for sufficient dimension reduction with symmetric positive definite matrices and beyond

B. Chen, S. Dai, Z. Yu

PDF

Open Access

TL;DR

This paper introduces intrinsic methods for sufficient dimension reduction with symmetric positive definite matrices, leveraging Riemannian geometry to improve estimation accuracy and extend applicability.

Contribution

It develops the intrinsic minimum average variance estimation and outer product gradient methods using Riemannian metrics, extending to general manifolds and providing rigorous theoretical guarantees.

Findings

01

Methods outperform existing techniques in simulations.

02

Algorithms effectively estimate structural dimension with theoretical support.

03

Application to taxi network data demonstrates practical utility.

Abstract

In this paper, we target the problem of sufficient dimension reduction with symmetric positive definite matrices valued responses. We propose the intrinsic minimum average variance estimation method and the intrinsic outer product gradient method which fully exploit the geometric structure of the Riemannian manifold where responses lie. We present the algorithms for our newly developed methods under the log-Euclidean metric and the log-Cholesky metric. Each of the two metrics is linked to an abelian Lie group structure that transforms our model defined on a manifold into a Euclidean one. The proposed methods are then further extended to general Riemannian manifolds. We establish rigourous asymptotic results for the proposed estimators, including the rate of convergence and the asymptotic normality. We also develop a cross validation algorithm for the estimation of the structural…

Tables4

Table 1. Table 1 : Mean ( ± plus-or-minus \pm standard deviation) of estimation error for different methods in study I.

Model	$(p, n)$	WIRE	eu-iOPG	eu-iMAVE	ch-iOPG	ch-iMAVE	fOPG	fMAVE
I-1	(10,100)	0.0869	0.0693	0.0693	0.0616	0.0612	0.0891	0.3913
		$\pm$ 0.0229	$\pm$ 0.0170	$\pm$ 0.0169	$\pm$ 0.0185	$\pm$ 0.0184	$\pm$ 0.0255	$\pm$ 0.2300
	(10,200)	0.0617	0.0489	0.0488	0.0421	0.0422	0.0592	0.3803
		$\pm$ 0.0163	$\pm$ 0.0103	$\pm$ 0.0100	$\pm$ 0.0091	$\pm$ 0.0092	$\pm$ 0.0144	$\pm$ 0.2093
	(20,100)	0.1406	0.1118	0.1112	0.0973	0.0965	0.1443	0.3519
		$\pm$ 0.0242	$\pm$ 0.0193	$\pm$ 0.0194	$\pm$ 0.0184	$\pm$ 0.0183	$\pm$ 0.0322	$\pm$ 0.1769
	(20,200)	0.0953	0.0735	0.0735	0.0656	0.0654	0.0934	0.2748
		$\pm$ 0.0167	$\pm$ 0.0146	$\pm$ 0.0146	$\pm$ 0.0107	$\pm$ 0.0107	$\pm$ 0.0192	$\pm$ 0.1483
I-2	(10,100)	0.0577	0.0635	0.0605	0.0530	0.0532	0.2802	2.8665
		$\pm$ 0.0317	$\pm$ 0.0281	$\pm$ 0.0321	$\pm$ 0.0226	$\pm$ 0.0249	$\pm$ 0.4609	$\pm$ 0.2536
	(10,200)	0.0277	0.0283	0.0277	0.0246	0.0248	0.0504	2.9728
		$\pm$ 0.0229	$\pm$ 0.0227	$\pm$ 0.0222	$\pm$ 0.0223	$\pm$ 0.0215	$\pm$ 0.0302	$\pm$ 0.0854
	(20,100)	0.1314	0.1656	0.1376	0.1192	0.1155	1.3775	3.0551
		$\pm$ 0.0344	$\pm$ 0.0660	$\pm$ 0.0464	$\pm$ 0.0324	$\pm$ 0.0327	$\pm$ 0.5156	$\pm$ 0.3843
	(20,200)	0.0578	0.0582	0.0560	0.0525	0.0517	0.2223	2.9806
		$\pm$ 0.0128	$\pm$ 0.183	$\pm$ 0.0144	$\pm$ 0.0148	$\pm$ 0.0136	$\pm$ 0.1651	$\pm$ 0.1680

Table 2. Table 2 : Mean ( ± plus-or-minus \pm standard deviation) of estimation error for different methods in study II.

Model	$(p, n)$	WIRE	eu-iOPG	eu-iMAVE	ch-iOPG	ch-iMAVE	fOPG	fMAVE
II-1	(5,100)	1.2928	0.0872	0.0818	0.0871	0.0832	1.2666	1.2456
		$\pm$ 0.1478	$\pm$ 0.2090	$\pm$ 0.2084	$\pm$ 0.2108	$\pm$ 0.2164	$\pm$ 0.1962	$\pm$ 0.2002
	(5,200)	1.2308	0.0280	0.0254	0.0291	0.0260	1.2124	1.2240
		$\pm$ 0.1965	$\pm$ 0.0112	$\pm$ 0.0099	$\pm$ 0.0114	$\pm$ 0.0099	$\pm$ 0.2326	$\pm$ 0.2237
	(10,100)	1.3491	0.7189	0.6827	0.6925	0.6789	1.3413	1.3400
		$\pm$ 0.0728	$\pm$ 0.6296	$\pm$ 0.6479	$\pm$ 0.6276	$\pm$ 0.6438	$\pm$ 0.0790	$\pm$ 0.0791
	(10,200)	1.3367	0.1641	0.1490	0.1500	0.1461	1.3320	1.3385
		$\pm$ 0.1073	$\pm$ 0.3779	$\pm$ 0.3709	$\pm$ 0.3556	$\pm$ 0.3610	$\pm$ 0.1062	$\pm$ 0.0992
II-2	(5,100)	1.2118	0.0604	0.0554	0.0648	0.0594	1.2912	1.5003
		$\pm$ 0.2560	$\pm$ 0.0195	$\pm$ 0.0169	$\pm$ 0.0195	$\pm$ 0.0176	$\pm$ 0.2745	$\pm$ 0.1855
	(5,200)	1.1923	0.0338	0.0331	0.0360	0.0352	1.2266	1.4979
		$\pm$ 0.2578	$\pm$ 0.0093	$\pm$ 0.0092	$\pm$ 0.0099	$\pm$ 0.0099	$\pm$ 0.2354	$\pm$ 0.1562
	(10,100)	1.3954	0.3847	0.3651	0.3426	0.3246	1.6178	1.7211
		$\pm$ 0.1070	$\pm$ 0.5302	$\pm$ 0.5309	$\pm$ 0.4995	$\pm$ 0.5047	$\pm$ 0.1532	$\pm$ 0.1396
	(10,200)	1.3714	0.0637	0.0566	0.0675	0.0603	1.4808	1.7123
		$\pm$ 0.1005	$\pm$ 0.0121	$\pm$ 0.0104	$\pm$ 0.0126	$\pm$ 0.0108	$\pm$ 0.1526	$\pm$ 0.1334

Table 3. Table 3 : Mean ( ± plus-or-minus \pm standard deviation) of estimation error for different methods in study III.

Model	$(p, n)$	WIRE	iOPG	iMAVE	fOPG	fMAVE
III	(10,100)	0.3461	0.2555	0.2226	0.6743	1.5332
		$\pm$ 0.0803	$\pm$ 0.0770	$\pm$ 0.0643	$\pm$ 0.2456	$\pm$ 0.1610
	(10,200)	0.2270	0.1545	0.1475	0.4065	1.5104
		$\pm$ 0.0505	$\pm$ 0.0372	$\pm$ 0.0358	$\pm$ 0.1644	$\pm$ 0.0372
	(20,100)	0.5395	0.4766	0.3534	1.1215	1.6307
		$\pm$ 0.1012	$\pm$ 0.0967	$\pm$ 0.0699	$\pm$ 0.2209	$\pm$ 0.1392
	(20,200)	0.3474	0.2481	0.2172	0.6990	1.6083
		$\pm$ 0.0567	$\pm$ 0.0409	$\pm$ 0.0401	$\pm$ 0.1657	$\pm$ 0.1510

Table 4. Table 4 : Estimated CS directions in New York taxi network data.

Direction	Ave.Distance	Ave.Fare	Ave.Passengers	Ave.Tip	Cash	Credit	Dispute
$β_{1}$	0.2417	-0.4827	0.0927	-0.0313	-0.5720	0.5863	0.0074
$β_{2}$	0.6592	-0.4002	0.2242	0.0878	0.5755	-0.0817	-0.0101
$β_{3}$	-0.3931	-0.6700	-0.4348	0.0017	0.1952	-0.0343	-0.0577
	Free	LateHour	Ave.Temp	Ave.Humid	Ave.Wind	Ave.Press	Precip
$β_{1}$	0.1277	0.0988	-0.0339	-0.0053	-0.0025	-0.2579	-0.0033
$β_{2}$	-0.0174	0.0579	-0.0511	-0.0075	-0.0062	-0.0540	0.0064
$β_{3}$	-0.1134	-0.3692	0.0789	-0.0030	-0.0163	0.0833	0.0470

Equations92

E (Y ∣ X) = E (Y ∣ B_{0}^{T} X),

E (Y ∣ X) = E (Y ∣ B_{0}^{T} X),

Y = g (B_{0}^{T} X) + ε,

Y = g (B_{0}^{T} X) + ε,

min_{B : B^{T} B = I} E {Y - E (Y ∣ B^{T} X)}^{2},

min_{B : B^{T} B = I} E {Y - E (Y ∣ B^{T} X)}^{2},

B : B^{T} B = I min E (E [{Y - E (Y ∣ B^{T} X)}^{2} ∣ B^{T} X]) .

B : B^{T} B = I min E (E [{Y - E (Y ∣ B^{T} X)}^{2} ∣ B^{T} X]) .

B : B^{T} B = I min j = 1 \sum n i = 1 \sum n w_{ij} {Y_{i} - E (Y_{i} ∣ B^{T} X_{i})}^{2},

B : B^{T} B = I min j = 1 \sum n i = 1 \sum n w_{ij} {Y_{i} - E (Y_{i} ∣ B^{T} X_{i})}^{2},

a _{j} , b _{j} B : B ^{T} B = I min j = 1 \sum n i = 1 \sum n w_{ij} [Y_{i} - {a_{j} + b_{j}^{T} B^{T} (X_{i} - X_{j})}]^{2},

a _{j} , b _{j} B : B ^{T} B = I min j = 1 \sum n i = 1 \sum n w_{ij} [Y_{i} - {a_{j} + b_{j}^{T} B^{T} (X_{i} - X_{j})}]^{2},

Y = g (B_{0}^{T} X) \oplus ε .

Y = g (B_{0}^{T} X) \oplus ε .

E {Log_{D (B^{T} x)} Y ∣ B^{T} x} = O_{m},

E {Log_{D (B^{T} x)} Y ∣ B^{T} x} = O_{m},

B : B^{T} B = I min j = 1 \sum n i = 1 \sum n w_{ij} d^{2} {Y_{i}, D (B^{T} X_{i})} .

B : B^{T} B = I min j = 1 \sum n i = 1 \sum n w_{ij} d^{2} {Y_{i}, D (B^{T} X_{i})} .

ϕ_{D (B^{T} X_{j})} : T_{D (B^{T} X_{j})} Sym^{+} (m) \to T_{I_{m}} Sym^{+} (m) .

ϕ_{D (B^{T} X_{j})} : T_{D (B^{T} X_{j})} Sym^{+} (m) \to T_{I_{m}} Sym^{+} (m) .

Log_{D (B^{T} X_{j})} D (B^{T} x) = \approx ϕ_{D (B^{T} X_{j})}^{- 1} {f (B^{T} x)} ϕ_{D (B^{T} X_{j})}^{- 1} [b_{j} I_{m} \otimes {B^{T} (x - X_{j})}],

Log_{D (B^{T} X_{j})} D (B^{T} x) = \approx ϕ_{D (B^{T} X_{j})}^{- 1} {f (B^{T} x)} ϕ_{D (B^{T} X_{j})}^{- 1} [b_{j} I_{m} \otimes {B^{T} (x - X_{j})}],

D (B^{T} x) \approx Exp_{D (B^{T} X_{j})} (ϕ_{D (B^{T} X_{j})}^{- 1} [b_{j} I_{m} \otimes {B^{T} (x - X_{j})}]),

D (B^{T} x) \approx Exp_{D (B^{T} X_{j})} (ϕ_{D (B^{T} X_{j})}^{- 1} [b_{j} I_{m} \otimes {B^{T} (x - X_{j})}]),

b_{j}=\left(\begin{array}[]{cccc}c_{11}^{T}(X_{j})&c_{12}^{T}(X_{j})&\cdots&c_{1m}^{T}(X_{j})\\ c_{21}^{T}(X_{j})&c_{22}^{T}(X_{j})&\cdots&c_{2m}^{T}(X_{j})\\ \vdots&\vdots&&\vdots\\ c_{m1}^{T}(X_{j})&c_{m2}^{T}(X_{j})&\cdots&c_{mm}^{T}(X_{j})\end{array}\right)_{m\times md}\quad(j=1,...,n),

b_{j}=\left(\begin{array}[]{cccc}c_{11}^{T}(X_{j})&c_{12}^{T}(X_{j})&\cdots&c_{1m}^{T}(X_{j})\\ c_{21}^{T}(X_{j})&c_{22}^{T}(X_{j})&\cdots&c_{2m}^{T}(X_{j})\\ \vdots&\vdots&&\vdots\\ c_{m1}^{T}(X_{j})&c_{m2}^{T}(X_{j})&\cdots&c_{mm}^{T}(X_{j})\end{array}\right)_{m\times md}\quad(j=1,...,n),

a _{j} , b _{j} B : B ^{T} B = I min j = 1 \sum n i = 1 \sum n w_{ij} d^{2} {Y_{i}, Exp_{a_{j}} (ϕ_{a_{j}}^{- 1} [b_{j} I_{m} \otimes {B^{T} (X_{i} - X_{j})}])},

a _{j} , b _{j} B : B ^{T} B = I min j = 1 \sum n i = 1 \sum n w_{ij} d^{2} {Y_{i}, Exp_{a_{j}} (ϕ_{a_{j}}^{- 1} [b_{j} I_{m} \otimes {B^{T} (X_{i} - X_{j})}])},

a_{j}, b_{j} min j = 1 \sum n i = 1 \sum n w_{ij} d^{2} (Y_{i}, Exp_{a_{j}} [ϕ_{a_{j}}^{- 1} {b_{j} I_{m} \otimes (X_{i} - X_{j})}]),

a_{j}, b_{j} min j = 1 \sum n i = 1 \sum n w_{ij} d^{2} (Y_{i}, Exp_{a_{j}} [ϕ_{a_{j}}^{- 1} {b_{j} I_{m} \otimes (X_{i} - X_{j})}]),

S_{1} \oplus S_{2} = exp {log (S_{1}) + log (S_{2})} .

S_{1} \oplus S_{2} = exp {log (S_{1}) + log (S_{2})} .

d {Y_{i}, D (B^{T} X_{i})} = ∣∣ log {D (B^{T} X_{i})} - log Y_{i} ∣ ∣_{F} .

d {Y_{i}, D (B^{T} X_{i})} = ∣∣ log {D (B^{T} X_{i})} - log Y_{i} ∣ ∣_{F} .

a _{j} , b _{j} B : B ^{T} B = I min j = 1 \sum n i = 1 \sum n w_{ij} ∣∣ a_{j} + b_{j} I_{m} \otimes {B^{T} (X_{i} - X_{j})} - log Y_{i} ∣ ∣_{F}^{2},

a _{j} , b _{j} B : B ^{T} B = I min j = 1 \sum n i = 1 \sum n w_{ij} ∣∣ a_{j} + b_{j} I_{m} \otimes {B^{T} (X_{i} - X_{j})} - log Y_{i} ∣ ∣_{F}^{2},

a_{j}, b_{j} min j = 1 \sum n i = 1 \sum n w_{ij} ∣∣ a_{j} + b_{j} I_{m} \otimes (X_{i} - X_{j}) - log Y_{i} ∣ ∣_{F}^{2} .

a_{j}, b_{j} min j = 1 \sum n i = 1 \sum n w_{ij} ∣∣ a_{j} + b_{j} I_{m} \otimes (X_{i} - X_{j}) - log Y_{i} ∣ ∣_{F}^{2} .

\begin{split}&w_{ij}=\frac{K_{h}(B^{T}(X_{i}-X_{j}))}{\sum_{i=1}^{n}K_{h}(B^{T}(X_{i}-X_{j}))},\quad\alpha_{j}=\left(\begin{array}[]{c}\mathrm{vecs}(a_{j})\\ \mathrm{vecss}(b_{j})\end{array}\right),\\ &\chi_{i}(X_{j})=\Big{(}I_{q},I_{q}\otimes(X_{i}-X_{j})^{T}\Big{)}^{T},\quad\chi_{i}(B^{T}X_{j})=\Big{(}I_{q},I_{q}\otimes\big{(}(X_{i}-X_{j})^{T}B\big{)}\Big{)}^{T},\\ &A_{ij}=\Big{(}c_{11}(X_{j}),c_{21}(X_{j}),c_{22}(X_{j}),...,c_{m1}(X_{j}),...,c_{mm}(X_{j})\Big{)}\otimes(X_{i}-X_{j}),\end{split}

\begin{split}&w_{ij}=\frac{K_{h}(B^{T}(X_{i}-X_{j}))}{\sum_{i=1}^{n}K_{h}(B^{T}(X_{i}-X_{j}))},\quad\alpha_{j}=\left(\begin{array}[]{c}\mathrm{vecs}(a_{j})\\ \mathrm{vecss}(b_{j})\end{array}\right),\\ &\chi_{i}(X_{j})=\Big{(}I_{q},I_{q}\otimes(X_{i}-X_{j})^{T}\Big{)}^{T},\quad\chi_{i}(B^{T}X_{j})=\Big{(}I_{q},I_{q}\otimes\big{(}(X_{i}-X_{j})^{T}B\big{)}\Big{)}^{T},\\ &A_{ij}=\Big{(}c_{11}(X_{j}),c_{21}(X_{j}),c_{22}(X_{j}),...,c_{m1}(X_{j}),...,c_{mm}(X_{j})\Big{)}\otimes(X_{i}-X_{j}),\end{split}

\begin{split}\hat{\alpha}_{j}^{(t)}=&\Big{\{}\sum_{i=1}^{n}w_{ij}^{(t-1)}\chi_{i}(\hat{B}_{(t-1)}^{T}X_{j})\chi_{i}(\hat{B}_{(t-1)}^{T}X_{j})^{T}\Big{\}}^{-1}\\ &\times\sum_{i=1}^{n}w_{ij}^{(t-1)}\chi_{i}(\hat{B}_{(t-1)}^{T}X_{j})\mathrm{vecs}(\mathrm{log}Y_{i})\quad(j=1,...,n).\end{split}

\begin{split}\hat{\alpha}_{j}^{(t)}=&\Big{\{}\sum_{i=1}^{n}w_{ij}^{(t-1)}\chi_{i}(\hat{B}_{(t-1)}^{T}X_{j})\chi_{i}(\hat{B}_{(t-1)}^{T}X_{j})^{T}\Big{\}}^{-1}\\ &\times\sum_{i=1}^{n}w_{ij}^{(t-1)}\chi_{i}(\hat{B}_{(t-1)}^{T}X_{j})\mathrm{vecs}(\mathrm{log}Y_{i})\quad(j=1,...,n).\end{split}

\mathrm{vec}(\hat{B}_{(t)})=\Big{\{}\sum_{j=1}^{n}\sum_{i=1}^{n}w_{ij}^{(t-1)}A_{ij}^{(t)}(A_{ij}^{(t)})^{T}\Big{\}}^{-1}\sum_{j=1}^{n}\sum_{i=1}^{n}w_{ij}^{(t-1)}A_{ij}^{(t)}\mathrm{vecs}(\mathrm{log}Y_{i}-\hat{a}_{j}^{(t)}).

\mathrm{vec}(\hat{B}_{(t)})=\Big{\{}\sum_{j=1}^{n}\sum_{i=1}^{n}w_{ij}^{(t-1)}A_{ij}^{(t)}(A_{ij}^{(t)})^{T}\Big{\}}^{-1}\sum_{j=1}^{n}\sum_{i=1}^{n}w_{ij}^{(t-1)}A_{ij}^{(t)}\mathrm{vecs}(\mathrm{log}Y_{i}-\hat{a}_{j}^{(t)}).

\overset{α}{^}_{j}^{(t)} = {i = 1 \sum n w_{ij}^{(t - 1)} χ_{i} (X_{j}) χ_{i} (X_{j})^{T}}^{- 1} i = 1 \sum n w_{ij}^{(t - 1)} χ_{i} (X_{j}) vecs (log Y_{i}) (j = 1, ..., n) .

\overset{α}{^}_{j}^{(t)} = {i = 1 \sum n w_{ij}^{(t - 1)} χ_{i} (X_{j}) χ_{i} (X_{j})^{T}}^{- 1} i = 1 \sum n w_{ij}^{(t - 1)} χ_{i} (X_{j}) vecs (log Y_{i}) (j = 1, ..., n) .

\hat{b}_{j}^{(t)}=\left(\begin{array}[]{cccc}c_{11}^{T}&&&\\ c_{21}^{T}&c_{22}^{T}&&\\ \vdots&\vdots&\ddots&\\ c_{m1}^{T}&c_{m2}^{T}&\cdots&c_{mm}^{T}\end{array}\right)\quad(j=1,...,n),

\hat{b}_{j}^{(t)}=\left(\begin{array}[]{cccc}c_{11}^{T}&&&\\ c_{21}^{T}&c_{22}^{T}&&\\ \vdots&\vdots&\ddots&\\ c_{m1}^{T}&c_{m2}^{T}&\cdots&c_{mm}^{T}\end{array}\right)\quad(j=1,...,n),

\hat{β}_{j}^{(t)} = (c_{11}, c_{21}, c_{22}, ..., c_{m 1}, ..., c_{mm})^{T} \in R^{q \times p}, j = 1, ..., n .

\hat{β}_{j}^{(t)} = (c_{11}, c_{21}, c_{22}, ..., c_{m 1}, ..., c_{mm})^{T} \in R^{q \times p}, j = 1, ..., n .

\hat{Λ}^{(t)} = \frac{1}{n} j = 1 \sum n (\hat{β}_{j}^{(t)})^{T} \hat{β}_{j}^{(t)} .

\hat{Λ}^{(t)} = \frac{1}{n} j = 1 \sum n (\hat{β}_{j}^{(t)})^{T} \hat{β}_{j}^{(t)} .

Log_{μ} Y = h (B_{0}^{T} X) + ζ,

Log_{μ} Y = h (B_{0}^{T} X) + ζ,

W_{B_{0}} Σ_{0} W_{0} = E [{h^{(1)} (B_{0}^{T} X)^{T} h^{(1)} (B_{0}^{T} X)} \otimes {v_{B_{0}} (X) v_{B_{0}}^{T} (X)}], = var [{h^{(1)} (B_{0}^{T} X)^{T} \otimes v_{B_{0}} (X)} ζ], = var [{M_{0}^{- 1} h^{(1)} (B_{0}^{T} X)^{T} ζ} \otimes {\overset{w}{ˉ}_{B_{0}}^{+} (X) v_{B_{0}} (X)}] .

W_{B_{0}} Σ_{0} W_{0} = E [{h^{(1)} (B_{0}^{T} X)^{T} h^{(1)} (B_{0}^{T} X)} \otimes {v_{B_{0}} (X) v_{B_{0}}^{T} (X)}], = var [{h^{(1)} (B_{0}^{T} X)^{T} \otimes v_{B_{0}} (X)} ζ], = var [{M_{0}^{- 1} h^{(1)} (B_{0}^{T} X)^{T} ζ} \otimes {\overset{w}{ˉ}_{B_{0}}^{+} (X) v_{B_{0}} (X)}] .

∣∣ \hat{B}_{iMAVE} \hat{B}_{iMAVE}^{T} - B_{0} B_{0}^{T} ∣ ∣_{F} = O (h^{3} + h δ_{d h} + δ_{d h}^{2} / h + n^{- 1/2})

∣∣ \hat{B}_{iMAVE} \hat{B}_{iMAVE}^{T} - B_{0} B_{0}^{T} ∣ ∣_{F} = O (h^{3} + h δ_{d h} + δ_{d h}^{2} / h + n^{- 1/2})

n {vec (\hat{B}_{iMAVE} \hat{B}_{iMAVE}^{T} B_{0}) - vec (B_{0})} \to d N (0, W_{B_{0}}^{+} Σ_{0} W_{B_{0}}^{+}) .

n {vec (\hat{B}_{iMAVE} \hat{B}_{iMAVE}^{T} B_{0}) - vec (B_{0})} \to d N (0, W_{B_{0}}^{+} Σ_{0} W_{B_{0}}^{+}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBone and Joint Diseases · Statistical Methods and Inference · Point processes and geometric inequalities

Full text

Intrinsic Minimum Average Variance Estimation for Sufficient Dimension Reduction with Symmetric Positive

Definite Matrices and Beyond

Baiyu Chen

School of Statistics, East China Normal University

and

Shuang Dai

School of Statistics, East China Normal University

and

Zhou Yu

School of Statistics, East China Normal University

Abstract

In this paper, we target the problem of sufficient dimension reduction with symmetric positive definite matrices valued responses. We propose the intrinsic minimum average variance estimation method and the intrinsic outer product gradient method which fully exploit the geometric structure of the Riemannian manifold where responses lie. We present the algorithms for our newly developed methods under the log-Euclidean metric and the log-Cholesky metric. Each of the two metrics is linked to an abelian Lie group structure that transforms our model defined on a manifold into a Euclidean one. The proposed methods are then further extended to general Riemannian manifolds. We establish rigourous asymptotic results for the proposed estimators, including the rate of convergence and the asymptotic normality. We also develop a cross validation algorithm for the estimation of the structural dimension with theoretical guarantee Comprehensive simulation studies and an application to the New York taxi network data are performed to show the superiority of the proposed methods.

Keywords: Sufficient dimension reduction; Sliced inverse regression; Minimum average variance estimation; Outer product gradient; Symmetric positive definite matrix.

1 Introduction

Undergoing accelerated developments for more than 20 years, sufficient dimension reduction (SDR) has now become a powerful tool in statistics, partly thanks to the increasing demand for techniques to deal with high-dimensional circumstances. Multiple classes of SDR methods have evolved themselves to be well-established for high-dimensional data analysis with Euclidean responses and predictors. Typical SDR tools include the inverse regression estimation methods (e.g., Sliced Inverse Regression (Li, 1991), Sliced Average Variance Estimation (Cook and Weisberg, 1991) and directional regression (Li and Wang, 2007)), the nonparametric method like the outer product of gradients (OPG) and the minimum average variance estimation (MAVE) method (Xia et al., 2002), and the semiparametric approach (Ma and Zhu, 2012, 2013, 2019).

However, the prosperity of big data is accomplished by the abundance of non-Euclidean objects where traditional dimension reduction methods fail. For example, in an Alzheimer’s Disease Neuroimaging Initiative (ANDI) study (Lin et al., 2022), subjects were invited to a medical center to get their brain images and assessment of their behavior abilities. Then a preprocessing protocol is applied to turn brain images into the average hippocampal diffusion tensors which are $3\times 3$ symmetric positive definite (SPD) matrices characterizing diffusion of water molecules in tissues and conveying rich information about brain tissues. Finally researchers are faced with a data set $(Y_{i},X_{1i},X_{2i},...,X_{pi})$ $(i=1,...,n)$ where the response $Y$ is a $3\times 3$ SPD matrix and $X_{1},...,X_{p}$ are predictors standardized to the interval $[0,1]$ representing the scores of each subject’s memory, executive functioning, language ability and so on. Another example arises from the taxi services within a city. Researchers divide the city into several zones and take these zones as nodes in a network or graph. This graph is further weighted by the number of taxi pick-ups and drop-offs between zones in a time interval. Proper transformations can turn these graphs into SPD matrices describing the taxi movements in a city. After collecting potential predictors such as travel distance, fare amount, average daily temperature and total precipitation, one can analyze the relationship between the taxi movements and possible factors.

In above examples, responses are non-Euclidean and lie in $\mathrm{Sym}^{+}(m)$ which stands for a manifold consisting of $m\times m$ SPD matrices. When the dimension of prediction variable is large, sufficient dimension reduction is necessary to avoid the curse of dimensionality but unfortunately, traditional Euclidean methods cannot work for responses being SPD matrices. As a consequence, there has been a growing need to carry out SDR with SPD matrices as responses.

Up to now there have been many works where traditional statistical methods in Euclidean spaces are generalized to manifolds or more general metric spaces such as local polynomial regression for SPD matrices (Yuan et al., 2012; Zhu et al., 2009; Cornea et al., 2016), Fréchet regression for random objects (Peterson and Müller, 2019a), intrinsic Riemannian functional principal component analysis and functional linear regression (Lin and Yao, 2019), additive model for SPD matrices (Lin et al., 2022), Fréchet sufficient dimension reduction for random objects (Ying and Yu, 2022; Zhang et al., 2021), intrinsic Wasserstein correlation analysis (Zhou et al., 2021), single index Fréchet regression (Bhattacharjee and Müller, 2021), autoregressive optimal transport model (Zhu and Müller, 2021) and so on.

Among these works, two recent papers are related to non-Euclidean SDR. Ying and Yu (2022) extended the traditional SIR model to the case where the predictors are Euclidean while the response takes values in a metric space. They borrowed strength from the martingale difference divergence to avoid the estimation of $E(X\mid Y)$ and to absorb information in $Y$ by including the distance function in the metric space. The work of Zhang et al. (2021) turned almost all existing Euclidean SDR methods into ones for Euclidean $X$ and metric space-valued $Y$ , which is very comprehensive and flexible.

In their proposal, the random object $Y$ is first mapped into a real-valued random variable and then classic SDR methods can be applied to the transformed data. However, when the response lie in a manifold, even though the two methods aforementioned can be performed, they fail to fully exploit the intrinsic geometry of the manifold and thus some information contained in the response is inevitably lost.

In this paper, we consider the dimension reduction of the conditional mean (Cook and Li, 2002) with SPD matrices. The basic problem is to find a lower dimensional predictor $B_{0}^{T}X$ such that

[TABLE]

where $Y\in\text{Sym}^{+}(m)$ and $B_{0}$ is a $p\times d$ matrix. To fully incorporate the information in the $\mathrm{Sym}^{+}(m)$ -valued response, we generalize the state-of-the-art sufficient mean dimension reduction method MAVE and OPG for the estimation of the column space spanned by $B_{0}$ . The basic idea of our method also stems from the local polynomial regression (ILPR) for SPD matrices introduced by Yuan et al. (2012), which replaced the square distance by the geodesic distance on $\mathrm{Sym}^{+}(m)$ and performed Taylor expansion after parallel transport to estimate an intrinsic conditional expectation of an SPD matrix response, given a covariate vector $X$ . Yuan et al. (2012) only considered the case where $X$ is a scalar. We in this paper take a step forward to handle the high-dimensional $X$ . We call our method intrinsic MAVE and intrinsic OPG since $\mathrm{Sym}^{+}(m)$ cannot be isometrically embedded into a Euclidean space and we deal with it in a totally intrinsic way.

The rest of this paper is organized as follows. Some preliminaries on manifolds are introduced in Section 2. Then we introduce our intrinsic dimension reduction proposals and algorithms with SPD matrices in Section 3 and Section 4. Our proposed methods for SPD matrices are extended to general manifolds in Section 5. Asymptotic results, including the rate of convergence and asymptotic normality are established in Section 6. A cross validation procedure to determine the structural dimension is presented in Section 7. Simulation studies are illustrated in Section 8 and a real data application is carried out in Section 9. Section 10 concludes this paper. Additional simulation results and proofs for theorems can be found in the supplementary material.

2 Preliminaries on Manifolds

We first introduce some basic notions for Riemannian manifolds and Lie groups (Tu, 2011; Lang, 1999). Let $\mathcal{M}$ be a simply connected and smooth manifold and $p\in\mathcal{M}$ . For a small scalar $\delta>0$ , let $c(t)$ be a continuously differential map from $(-\delta,\delta)$ to $\mathcal{M}$ passing through $c(0)=p$ . A tangent vector at $p$ is the derivative of the curve $c(t)$ at $t=0$ . All such tangent vectors at $p$ form a vector space named the tangent space at $p$ , which is denoted by $T_{p}\mathcal{M}$ . Each tangent space $T_{p}\mathcal{M}$ can be endowed with an inner product $\langle\cdot,\cdot\rangle_{p}$ that varies smoothly with $p$ . The inner products $\{\langle\cdot,\cdot\rangle_{p}:p\in\mathcal{M}\}$ are collectively denoted by $\langle\cdot,\cdot\rangle$ , which is referred to as the Riemannian metric of $\mathcal{M}$ . With a Riemannian metric, we can define a distance $d(\cdot,\cdot)$ on $\mathcal{M}$ that turns $\mathcal{M}$ into a metric space. The length of a continuously differentiable curve $c(t):[t_{0},t_{1}]\rightarrow\mathcal{M}$ is calculated as $\int_{t_{0}}^{t_{1}}\langle c^{\prime}(t),c^{\prime}(t)\rangle_{c(t)}^{1/2}\mathrm{d}t$ , where $c^{\prime}(t)$ is the derivative of $c(t)$ . And $d(p,q)$ is the infimum of the length over all continuously differentiable curves joining $p$ and $q$ .

A geodesic $\gamma$ is a curve defined on $[0,\infty)$ such that for each $t\in[0,\infty)$ , $\gamma([t,t+\epsilon])$ is the shortest path connecting $\gamma(t)$ and $\gamma(t+\epsilon)$ for sufficiently small $\epsilon>0$ . The Riemannian exponential map $\mathrm{Exp}_{p}$ at $p\in\mathcal{M}$ is a function mapping $T_{p}\mathcal{M}$ into $\mathcal{M}$ and defined by $\mathrm{Exp}_{p}(u)=\gamma(1)$ with $\gamma(0)=p$ and $\gamma^{\prime}(0)=u\in T_{p}\mathcal{M}$ . The inverse of $\mathrm{Exp}_{p}$ , if exists, denoted by $\mathrm{Log}_{p}$ and called the Riemannian logarithm map at $p$ , can be defined as $\mathrm{Log}_{p}q=u$ for $q\in\mathcal{M}$ such that $\mathrm{Exp}_{p}u=q$ .

A vector field $U$ is a function defined on $\mathcal{M}$ such that $U(p)\in T_{p}\mathcal{M}$ . Given a curve $\gamma(t)$ on $\mathcal{M}$ , $t\in I$ for a real interval $I$ , a vector field along $\gamma$ is a smooth map defined on $I$ such that $U(t)\in T_{\gamma(t)}\mathcal{M}$ . We say $U$ is parallel along $\gamma$ if $\triangledown_{\gamma^{\prime}(t)}U=0$ for all $t\in I$ where $\triangledown$ is the Levi-Civita connection on $\mathcal{M}$ . In this paper we only focus on parallel vector fields along geodesics. Let $\gamma:[0,1]\rightarrow\mathcal{M}$ be a geodesic connecting $p$ and $q$ , and $U$ is a parallel vector field along $\gamma$ such that $U(0)=u$ and $U(1)=v$ . Then the parallel transport of $u$ along $\gamma$ is denoted as $\phi_{p}(u)=v$ .

When $(\mathcal{M},\oplus)$ is a group and the group operation $\oplus$ and its inverse are both smooth, $(\mathcal{M},\oplus)$ is called a Lie group. The tangent space at the identity element $e$ is called a Lie algebra denoted by $\mathfrak{g}$ . It consists of left-invariant vector fields $U$ which satisfies $U(p\oplus q)=(DL_{p})(U(q))$ , where $L_{p}:q\rightarrow p\oplus q$ is the left translation at $p$ and $DL_{p}$ is the differential of $L_{p}$ . A Riemannian metric $\langle\cdot,\cdot\rangle$ is called left-invariant if $\langle u,v\rangle_{q}=\langle DL_{p}(u),DL_{p}(v)\rangle_{p\oplus q}$ for all $p,q\in\mathcal{M}$ and $u,v\in T_{q}\mathcal{M}$ . Right invariance can be defined similarly. A metric is bi-invariant if it is both left-invariant and right-invariant. The Lie exponential map, denoted by $\mathfrak{exp}$ is defined by $\mathfrak{exp}(u)=\gamma(1)$ where $\gamma:R\rightarrow\mathcal{M}$ is the unique one-parameter subgroup such that $\gamma^{\prime}(0)=u\in\mathfrak{g}$ . Its inverse, if exists, is denoted by $\mathfrak{log}$ . Please make a distinction between the Riemannian exponential map “ $\mathrm{Exp}$ ”, the Lie exponential map “ $\mathfrak{exp}$ ” and the common matrix exponential operation “ $\mathrm{exp}$ ” which appear frequently in later sections. When $\langle\cdot,\cdot\rangle$ is bi-invariant, then $\mathfrak{exp}$ coincides with $\mathrm{Exp}_{e}$ .

3 Intrinsic MAVE and OPG for SPD Matrices

The classic MAVE in a Euclidean space adopts the following regression-type model for conditional mean dimension reduction:

[TABLE]

where $Y$ and $X$ are respectively $R$ -valued and $R^{p}$ -valued random variables, $g$ is an unknown smooth link function, $B_{0}=(\beta_{1},...,\beta_{d})$ is a $p\times d$ orthogonal matrix ( $B_{0}^{T}B_{0}=I_{d\times d}$ ) with $d<p$ and $E(\varepsilon\mid X)=0$ almost surely. MAVE aims to estimate $B_{0}$ as $B_{0}^{T}X$ captures all information about $Y$ provided by $X$ .

MAVE targets $B_{0}$ by solving

[TABLE]

which is equivalent to

[TABLE]

Suppose $(Y_{i},X_{i})$ $(i=1,...,n)$ is a sample from $(Y,X)$ . Following the idea of local linear regression, the above formula can be approximated by

[TABLE]

which can be further approximated by

[TABLE]

where $w_{ij}=K_{h}(X_{i}-X_{j})/\sum_{i=1}^{n}K_{h}(X_{i}-X_{j})$ and for $u\in R^{p}$ , $K_{h}(u)=K(u/h)/h^{p}$ with $K(\cdot)$ being the kernel function and $h\in R$ being the bandwidth. Optimizing (4) gives the estimation of $B_{0}$ .

When it comes to the manifold case where $X\in R^{p}$ but $Y\in\mathrm{Sym}^{+}(m)$ , model (2) should be modified. In this case, $g:R^{d}\rightarrow\mathrm{Sym}^{+}(m)$ is a link function and $g(B_{0}^{T}X),\varepsilon\in\mathrm{Sym}^{+}(m)$ . In order to ensure that $Y\in\mathrm{Sym}^{+}(m)$ , we assume a group structure on $\mathrm{Sym}^{+}(m)$ with the group operator $\oplus$ , and replace $+$ by $\oplus$ . To make our model more flexible, we further assume that $(\mathrm{Sym}^{+}(m),\oplus)$ is a commutative group (abelian group).

Let $(\mathrm{Sym}^{+}(m),\oplus)$ be an abelian group endowed with a Riemannian metric $\langle\cdot,\cdot\rangle$ . Let $g:R^{d}\rightarrow\mathrm{Sym}^{+}(m)$ be the link function and $\varepsilon\in\mathrm{Sym}^{+}(m)$ be the random noise whose Fréchet mean corresponds to the group identity element. Then conditional mean sufficient dimension reduction with $X\in R^{p}$ but $Y\in\mathrm{Sym}^{+}(m)$ can be formulated as

[TABLE]

We first figure out the definition of conditional expectation $\mathbb{E}(Y\mid B^{T}X)$ when $Y$ is an SPD matrix. According to Yuan et al. (2012), the intrinsic conditional expectation of $Y$ at $B^{T}X=B^{T}x$ is defined as a SPD matrix $D(B^{T}x)\in\mathrm{Sym}^{+}(m)$ such that

[TABLE]

where $O_{m}$ is an $m\times m$ matrix with all elements 0 and the expectation is taken in a component-wise way. From now on we use $D(B^{T}x)$ instead of $E(Y\mid B^{T}x)$ .

Starting from (3), we replace the square distance by the squared geodesic distance $d^{2}(\cdot,\cdot)$ on the manifold and rewrite (3) as

[TABLE]

Next we want to similarly expand $D(B^{T}x)$ at $B^{T}X_{j}$ . However, $D(B^{T}x)$ is in the curved space and Taylor expansion is infeasible. Instead we apply the Riemannian logarithm map to transform $D(B^{T}x)$ to $\mathrm{Log}_{D(B^{T}X_{j})}D(B^{T}x)\in T_{D(B^{T}X_{j})}\mathrm{Sym}^{+}(m)$ . Since $\mathrm{Log}_{D(B^{T}X_{j})}D(B^{T}x)$ for different $X_{j}$ are in different tangent spaces, these tangent vectors are transported from $T_{D(B^{T}X_{j})}\mathrm{Sym}^{+}(m)$ to a same tangent space $T_{I_{m}}\mathrm{Sym}^{+}(m)$ using parallel transport given by:

[TABLE]

Thus $f(B^{T}x)=\phi_{D(B^{T}X_{j})}\mathrm{Log}_{D(B^{T}X_{j})}D(B^{T}x)$ is a function in a vector space and can be expanded at $B^{T}X_{j}$ using Taylor series expansion. Considering $f(B^{T}x)$ is an $m\times m$ symmetric matrix and $B^{T}X_{j}$ is a $d\times 1$ vector, we differentiate each component of $f(B^{T}x)$ with respect to $B^{T}X_{j}$ and this leads to

[TABLE]

which gives

[TABLE]

where only up to first order approximation is considered, $\phi_{D(B^{T}X_{j})}^{-1}$ is the inverse map of $\phi_{D(B^{T}X_{j})}$ and $\otimes$ is the Kronecker product.

In (7), both $D(B^{T}X_{j})$ and $b_{j}$ are parameters to estimate: $D(B^{T}X_{j})\in\mathrm{Sym}(m)$ serving as the 0-order approximation in Taylor expansion, $b_{j}$ being the derivative matrix in the first order term and possessing the structure

[TABLE]

where $c_{kl}(X_{j})=c_{lk}(X_{j})\in R^{d}$ $(k,l=1,...,m)$ . The $X_{j}$ in parentheses indicate that $b_{j}$ is related to $X_{j}$ . We use $a_{j}$ to denote $D(B^{T}X_{j})$ for simplicity here and hereafter.

Now we introduce three operators in matrix algebra. “ $\mathrm{vec}(\cdot)$ ” is the common matrix vec operator that vectorize an $m\times n$ matrix by column into an $mn\times 1$ vector. For an $m\times m$ symmetric matrix $A=(a_{ij})$ , define $\mathrm{vecs}(A)=(a_{11},a_{21},a_{22},...,a_{m1},...,a_{mm})^{T}$ . That is, “ $\mathrm{vecs}(\cdot)$ ” vectorize the lower triangle part of a symmetric matrix by row. For $b_{j}$ in (8), define $\mathrm{vecss}(b_{j})=(c_{11}^{T}(X_{j}),c_{21}^{T}(X_{j}),c_{22}^{T}(X_{j}),...,c_{m1}^{T}(X_{j}),...,c_{mm}^{T}(X_{j}))^{T}$ . We will frequently use $\mathrm{vec}(B)$ , $\mathrm{vecs}(a_{j})$ and $\mathrm{vecss}(b_{j})$ in subsequent sections.

Finally combining (7) with (6), we arrive at what we call the intrinsic MAVE method (iMAVE):

[TABLE]

where $a_{j}$ is $m\times m$ and $b_{j}$ is $m\times md$ .

The only difference between the classic OPG and the classic MAVE is the absence of $B$ in the former. So the intrinsic OPG method (iOPG) can be formulated immediately as

[TABLE]

where the size of $b_{j}$ here is $m\times mp$ .

Only the Riemannian metric needs specifying to solve (9) and (10). Actually we do not require (5) to be true since the procedure of deriving iMAVE and iOPG has nothing to do with the group structure on $\mathrm{Sym}^{+}(m)$ . We only assume (5) when performing the theoretical analysis. Thus (9) and (10) are flexible SDR methods but the choice of the metric affects the complexity of optimization.

4 Algorithms under the Log-Euclidean Metric

The log-Euclidean metric is proposed by Arsigny et al. (2007). The key observation is that: $\mathrm{Sym}^{+}(m)$ is diffeomorphic to its tangent space at the identity matrix, $\mathrm{Sym}(m)$ . To be specific, $\mathrm{exp}:\mathrm{Sym}(m)\rightarrow\mathrm{Sym}^{+}(m)$ and its inverse $\mathrm{log}$ are both smooth and they are diffeomorphisms.

Let $S_{1},S_{2}\in\mathrm{Sym}^{+}(m)$ . Define an operation $\oplus$ by

[TABLE]

Then $(\mathrm{Sym}^{+}(m),\oplus)$ is an abelian Lie group. The identity element is the identity matrix. Moreover, the Lie group exponential map $\mathfrak{exp}$ is just the matrix exponential $\mathrm{exp}$ . That is, the matrix logarithm $\mathrm{log}$ maps every SPD matrix in $\mathrm{Sym}^{+}(m)$ to the tangent space $T_{I_{m}}\mathrm{Sym}^{+}(m)$ . Based on this fact, we may get the expression of iMAVE under the log-Euclidean metric in a simpler way.

We start from (6). Under the log-Euclidean metric, the geodesic distance $d(S_{1},S_{2})=||\mathrm{log}S_{1}-\mathrm{log}S_{2}||_{F}$ . Here $||\cdot||_{F}$ is the Frobenius norm. So

[TABLE]

Since $\mathrm{log}\{D(B^{T}X_{i})\}$ and $\mathrm{log}Y_{i}$ are both in $T_{I_{m}}\mathrm{Sym}^{+}(m)$ , no parallel transportation is needed. Directly expand $\mathrm{log}\{D(B^{T}X_{i})\}$ at $B^{T}X_{j}$ , we get iMAVE under the log-Euclidean metric:

[TABLE]

and similarly iOPG under the log-Euclidean metric:

[TABLE]

Models (12) and (13) are optimized similarly to Xia et al. (2002) or Xia (2007). The main difference here is differentiating a matrix-valued function w.r.t. a vector. We in the following sketch out the algorithms. First some notations are introduced.

Write $q=m(m+1)/2$ and let

[TABLE]

where $c_{kl}(X_{j})$ $(1\leq l\leq k\leq m)$ are from (8).

Now we are ready for the algorithms.

Similarly the algorithm of iOPG is shown in Algorithm 2. Usually the result of OPG can be used as the initial value of $B$ in MAVE.

When $\mathrm{Sym}^{+}(m)$ is endowed the log-Cholesky metric, methods can be derived similarly as the key point is under the log-Cholesky metric, the geodesic distance between $S_{1},S_{2}\in\mathrm{Sym}^{+}(m)$ is $d(S_{1},S_{2})=||\mathrm{chol}(L_{1})-\mathrm{chol}(L_{2})||_{F}$ . Here $L_{1},L_{2}$ are Cholesky factors of $S_{1},S_{2}$ (Lin, 2019) and $\mathrm{chol}(L)=\lfloor L\rfloor+\mathrm{log}\mathbb{D}(L)$ where $\lfloor L\rfloor$ is the strict lower triangle part of $L$ and $\mathbb{D}(L)$ the diagonal part of $L$ . For any $S\in\mathrm{Sym}^{+}(m)$ and its Cholesky factor $L$ , $\mathrm{chol}(L)$ lies in a fixed vector space. Substituting $\mathrm{log}(\cdot)$ in the log-Euclidean case for $\mathrm{chol}(\cdot)$ and keeping other things unchanged, we get iMAVE, iOPG under the log-Cholesky metric and details are omitted.

5 Extension to General Riemannian Manifolds

According to the lemma S1 in Lin (2022), if ( $\mathcal{M}$ , $\oplus$ ) is an abelian Lie group endowed with a bi-invariant metric $\langle\cdot,\cdot\rangle$ that turns $\mathcal{M}$ into a Hadamard manifold, for any $y,z,u,v\in\mathcal{M}$ , $\mathrm{Log}_{y}(y\oplus z)=\phi_{e,y}(\mathfrak{log}z)$ , $\mathfrak{log}(u\oplus v)=\mathfrak{log}u+\mathfrak{log}v$ . Here $e$ is the identity element of the group. Endowing $\mathrm{Sym}^{+}(m)$ with the log-Euclidean metric or the log-Cholesky metric can meet the conditions. So applying above equations to $Y=\mu\oplus g(B_{0}^{T}X)\oplus\varepsilon$ which is equivalent to our model (5) with $\mu$ denoting the Fréchet mean of $Y$ , we have $\mathrm{Log}_{\mu}Y=\phi_{e,\mu}\mathfrak{log}g(B_{0}^{T}X)+\phi_{e,\mu}\mathfrak{log}\varepsilon$ . This model can be rewritten as

[TABLE]

where $h(\cdot)=\phi_{e,\mu}\mathfrak{log}g(\cdot)$ and $\zeta=\phi_{e,\mu}\mathfrak{log}\varepsilon$ . The model (14) is completely a Euclidean one since $h:R^{d}\rightarrow\mathrm{Sym}(m)$ is a vector-valued function defined in $R^{p}$ , which brings convenience for the theoretical analysis of iMAVE and iOPG. However if the chosen metric cannot turn $\mathrm{Sym}^{+}(m)$ into an abelian group with a bi-invariant metric, neither (5) nor (14) holds. In this case, we can directly assume model (14) for other metrics and furthermore general Riemannian manifolds.

Let $X\in R^{p}$ and $Y\in\mathcal{M}$ where $(\mathcal{M},\langle\cdot,\cdot\rangle)$ is a general Riemannian manifold. We assume the relationship between $X$ and $Y$ can be described by (14). We still aim at estimating $B_{0}$ and the estimating procedure is just the MAVE and OPG with multivariate response developed by Zhang (2021), which can also be derived by slightly modifying our proposed algorithms in Section 4.

For general Riemannian manifolds whose sectional curvature is positive, the Fréchet mean may not exist and therefore additional conditions are needed for (14). We assume

(A1)

The minimizer of the Fréchet function $F(\cdot)=Ed^{2}(\cdot,Y)$ exists and is unique.

This is automatically satisfied when $\mathcal{M}$ is $\mathrm{Sym}^{+}(m)$ equipped with either the log-Euclidean metric or the log-Cholesky metric.

For a subset $A$ of $\mathcal{M}$ , $A^{\epsilon}$ denotes the set $\cup_{p\in A}B(p;\epsilon)$ where $B(p;\epsilon)$ is the ball with center $p$ and radius $\epsilon$ in $\mathcal{M}$ . We use $\mathrm{Im}^{-\epsilon}(\mathrm{Exp}_{\mu})$ to denote the set $\mathcal{M}\setminus\{\mathcal{M}\setminus\mathrm{Im}(\mathrm{Exp}_{\mu})\}^{\epsilon}$ . In order to define $\mathrm{Log}_{\hat{\mu}}Y_{i}$ at least with a dominant probability for a large sample, we assume

(A2)

There is some constant $\epsilon_{0}>0$ such that $\mathrm{pr}\{Y\in\mathrm{Im}^{-\epsilon}(\mathrm{Exp}_{\mu})\}$ =1.

The condition (A2) is only needed when $\mathcal{M}$ is not a Hadamard manifold. If (A1) and (A2) are satisfied, (14) is well defined.

6 Asymptotic Results

We first establish the consistency and asymptotic normality of the iMAVE and iOPG estimators under the general manifolds case in model (14) and the results of $\mathrm{Sym}^{+}(m)$ endowed with either the log-Euclidean metric or the log-Cholesky metric is given as corollaries. We consider a manifold $\mathcal{M}$ that satisfied one of the following conditions:

(M1)

$\mathcal{M}$ is a finite-dimensional Hadamard manifold having sectional curvature bounded from below by $\mathfrak{c}_{0}<0$ .

(M2)

$\mathcal{M}$ is a complete compact Riemannian manifold.

An example satisfying (M1) is $\mathrm{Sym}^{+}(m)$ endowed with the log-Euclidean metric, the log-Cholesky metric or the affine-invariant metric while the unit sphere serves as an example satisfying (M2).

We have to treat $\phi\mathrm{Log}_{\hat{\mu}}Y_{i}-\mathrm{Log}_{\mu}Y_{i}$ during our proof where $\phi$ is short for $\phi_{\hat{\mu},\mu}$ . The method in Lin and Yao (2019) is applied here to write $\phi\mathrm{Log}_{\hat{\mu}}Y_{i}-\mathrm{Log}_{\mu}Y_{i}$ as $\{-H_{i}(\mu)+\Delta_{i}(\hat{\mu})\}\mathrm{Log}_{\mu}\hat{\mu}$ and the asymptotic normality of $\mathrm{Log}_{\mu}\hat{\mu}$ helps us control the discrepancy between $\mathrm{Log}_{\hat{\mu}}Y_{i}$ and $\mathrm{Log}_{\mu}Y_{i}$ . Above $\Delta_{i}(\hat{\mu})=o_{P}(1)$ and $H_{i}(y)=-(\triangledown Z_{i})(y)$ , acting on vector fields $U,V$ by $\langle H_{i}U,V\rangle(y)=\langle-\triangledown_{U}Z_{i},V\rangle(y)=\mathrm{Hess}_{y}\{d^{2}(y,Y_{i})/2\}(U,V)$ . Here $Z_{i}$ is a vector field with $Z_{i}(y)=\mathrm{Log}_{y}Y_{i}$ and “ $\mathrm{Hess}$ ” denotes the Hessian matrix (Kendall and Le, 2011). To make above reasoning valid, following conditions are needed.

(A3)

$\mathcal{M}$ satisfies at least one of the conditions (M1) and (M2).

(A4)

For all $y\in\mathcal{M}$ , $E\{d^{2}(y,Y)\}<\infty$ .

(A5)

For some constant $\mathfrak{c}_{1}>0$ , $F(y)-F(\mu)\geq\mathfrak{c}_{1}d^{2}(y,\mu)$ when $d(y,\mu)$ is sufficiently small.

(A6)

$\lambda_{\mathrm{min}}\{E(H_{t})\}>0$ where $\lambda_{\mathrm{min}}(\cdot)$ is the smallest eigenvalue of an operator or a matrix.

Conditions (A3)-(A6) are standard assumptions also made by Lin (2022), Kedall and Le (2011) and Lin and Yao (2019). (A4) is analogous to the moment condition in the Euclidean case. (A5) is satisfied for Hadamard manifolds with $c_{2}=1$ according to the lemma S.7 of Lin and Müller (2021). (A6) is made to ensure $H_{i}$ is invertible.

We need additional conditions that are standard in the literature on MAVE and OPG methods such as Xia et al. (2002) and Xia (2007).

Some notations are listed here. Suppose the dimension of $\mathcal{M}$ is $s$ and thus terms in (14) are $s$ -dimensional vectors. Let $h_{k}(B_{0}^{T}X)$ ( $k=1,...,s$ ) denote the $k$ th component of $h(B_{0}^{T}X)$ and $\zeta_{k}$ are defined similarly. Let $\mu_{B}(u)=E(X\mid B^{T}X=u)$ , $w_{B}(u)=E(XX^{T}\mid B^{T}X=u)$ , $v_{B}(u)=\mu_{B}(B^{T}u)-u$ , and $\bar{w}_{B}(u)=w_{B}(B^{T}u)-\mu_{B}(B^{T}u)\mu_{B}^{T}(B^{T}u)$ , which will be frequently encountered in proofs. For any square matrix $A$ , $A^{-1}$ and $A^{+}$ denote the inverse (if it exists) and the Moore-Penrose inverse matrix.

(B1)

For $k=1,...,s$ , $h_{k}(\cdot)$ has bounded, continuous third derivatives and $E(\zeta_{k}\mid X)=0$ .

(B2)

The density function $f(x)$ of $X$ has bounded second order derivatives on $R^{p}$ and is bounded away from 0 in a neighborhood around 0; $E|X|^{r}<\infty$ for some $r>8$ ; the functions $\mu_{B}(u)$ and $w_{B}(u)$ have bounded derivatives with respect to $u$ and $B$ for $B\in\{|B-B_{0}|<\delta\}$ for some $\delta>0$ .

(B3)

For every component $y_{k}$ ( $k=1,...,s$ ) in $\mathrm{log}Y$ , the density function $f_{y_{k}}$ has bounded derivative and is bounded away from 0 on a compact support; the conditional density functions $f_{y_{k}\mid X}(y\mid x)$ and $f_{y_{k}\mid B^{T}X}(y\mid u)$ have bounded fourth order derivatives w.r.t. $x,u$ and $B$ for $B$ in a neighborhood of $B_{0}$ .

(B4)

The matrix $M_{0}=E\left\{h^{(1)}(B_{0}^{T}X)^{T}h^{(1)}(B_{0}^{T}X)\right\}$ has full rank $d$ , where $h^{(1)}(\cdot)\in R^{s\times d}$ is the derivative matrix of $h(\cdot)$ .

(B5)

$K(\cdot)$ is a symmetric univariate density function with bounded second order derivatives. All the moments of $K(\cdot)$ exist.

(B6)

Bandwidths $h_{0}=c_{1}n^{-r_{h}}$ where $0<r_{h}\leq 1/(p_{0}+6)$ , $p_{0}=\max(p,3)$ . For $t\geq 1$ , $h_{t}=\max(r_{n}h_{t-1},h)$ where $r_{n}=n^{-r_{h}/2},h=c_{2}n^{-r_{h}^{\prime}}$ with $0<r_{h}^{\prime}\leq 1/(d+3)$ , and $c_{1},c_{2}$ are constants.

Define

[TABLE]

Theorem 6.1.

Under (A1)-(A6) and (B1)-(B6), the estimated $\hat{B}_{\mathrm{iMAVE}}$ from (14) satisfies

[TABLE]

in probability as $n\rightarrow\infty$ , where $\delta_{dh}=(nh^{d}/\mathrm{log}n)^{-1/2}$ . If $h^{3}+h\delta_{dh}+\delta_{dh}^{2}/h=o(n^{-1/2})$ , then

[TABLE]

Results in Theorem 6.1 are consistent with those in Xia et al (2002), Xia (2007) and Zhang (2021). The iMAVE shares the merit of classic MAVE that it can achieve a faster consistency rate even without undersmoothing the nonparametric link function estimator. Similar results of iOPG are shown below.

Theorem 6.2.

Under (A1)-(A6) and (B1)-(B6), the estimated $\hat{B}_{\mathrm{iOPG}}$ from (14) satisfies

[TABLE]

in probability as $n\rightarrow\infty$ , where $\delta_{dh}=(nh^{d}/\mathrm{log}n)^{-1/2}$ . If $h^{3}+h\delta_{dh}=o(n^{-1/2})$ , then

[TABLE]

When $\mathrm{Sym}^{+}(m)$ is endowed with the log-Euclidean metric or the log-Cholesky metric, the manifold-related conditions are automatically satisfied and thus only (B1)-(B6) are needed. We present theoretical results of iMAVE and iOPG with $Y$ lying in $\mathrm{Sym}^{+}(m)$ endowed with the log-Euclidean metric in (12) and (13) by the following corollaries. The log-Cholesky case is almost the same and is omitted.

In this case, $(\mathcal{M},\oplus)$ with $\oplus$ defined in (11) is an abelian Lie group and the bi-invariant log-Euclidean metric turns $\mathrm{Sym}^{+}(m)$ into a Hadamard manifold. Our model (5) is valid and can be transformed into

[TABLE]

by the same reasoning in Section 5 (with $\mu$ replaced by $e$ ). We denote $h(B_{0}^{T}X)=\mathrm{log}(g(B_{0}^{T}X))$ and $\zeta=\mathrm{log}\varepsilon$ . Terms in (15) are $m\times m$ symmetric matrices and if we vectorize the lower triangle part of these matrices into $m(m+1)/2$ -dimensional vectors, then (15) coincides with (14). Thus the main difference of the $\mathrm{Sym}^{+}(m)$ case is that $y_{k},h_{k},\zeta_{k}$ ( $k=1,...,s$ ) in (B1)-(B6) should be replaced by $y_{kl},h_{kl},\zeta_{kl}$ ( $1\leq l\leq k\leq m$ ) and $M_{0}$ in (B4) should be $M_{\mathrm{SPD}}=E\{\sum_{k=1}^{m}\sum_{l=1}^{k}h_{kl}^{(1)}(B_{0}^{T}X)h_{kl}^{(1)}(B_{0}^{T}X)^{T}\}$ .

Define

[TABLE]

Corollary 6.3.

Under(B1)-(B6), the estimated $\hat{B}_{\mathrm{iMAVE}}$ from (12) satisfies

[TABLE]

in probability as $n\rightarrow\infty$ , where $\delta_{dh}=(nh^{d}/\mathrm{log}n)^{-1/2}$ . If $h^{3}+h\delta_{dh}+\delta_{dh}^{2}/h=o(n^{-1/2})$ , then

[TABLE]

Corollary 6.4.

Under(B1)-(B6), the estimated $\hat{B}_{\mathrm{iOPG}}$ from (13) satisfies

[TABLE]

in probability as $n\rightarrow\infty$ , where $\delta_{dh}=(nh^{d}/\mathrm{log}n)^{-1/2}$ . If $h^{3}+h\delta_{dh}=o(n^{-1/2})$ , then

[TABLE]

In the proof of Corollary 6.3 and Corollary 6.4, we would not encounter $\phi\mathrm{Log}_{\hat{\mu}}Y_{i}-\mathrm{Log}_{\mu}Y_{i}$ . Actually even in the general manifold case $\phi\mathrm{Log}_{\hat{\mu}}Y_{i}-\mathrm{Log}_{\mu}Y_{i}$ does not have effects on the convergence rate and the asymptotic variance. As shown above, convergence rates in the general manifold case and the $\mathrm{Sym}^{+}(m)$ case are the same and asymptotic variances are consistent in form.

7 Determine the Structural Dimension

In this part, we discuss how to use a cross validation procedure to determine the structural dimension. We focus on the $\mathrm{Sym}^{+}(m)$ case and the method can be extended to general manifold similarly. Suppose $l$ is now the working dimension and $d$ is the true structural dimension. In the Euclidean case, Xia et al. (2002) defined

[TABLE]

where $y_{i}$ $(i=1,...,n)$ are scalars, $K_{h_{l}}^{(i,j)}=K_{h_{l}}(\hat{B}^{T}(X_{i}-X_{j}))$ and the suffix $l$ is used to indicate that the bandwidth depends on the working dimension $l$ . Actually $\hat{a}_{l0,j}$ is the N-W estimate of $y_{j}$ . And the CV value is

[TABLE]

In our case, $Y_{i}$ $(i=1,...,n)$ are now SPD matrices. If we equip $\mathrm{Sym}^{+}(m)$ with the log-Euclidean metric, then $\mathrm{log}Y_{i}$ $(i=1,...,n)$ are in $T_{I_{m}}\mathrm{Sym}^{+}(m)$ . Similarly define

[TABLE]

where $||\cdot||_{F}$ is the matrix Frobenius norm. We then estimate $d$ as

[TABLE]

Theorem 7.1.

Suppose assumptions (B1)-(B3) and (B5) hold. We have

[TABLE]

Theorem 7.1 shows that as $n\rightarrow\infty$ , the probability of choosing the right dimension tends to 1. If we equip $\mathrm{Sym}^{+}(m)$ with the log-Cholesky metric, above arguments still hold by replacing $\mathrm{log}Y_{i}$ with $\rm{chol}(Y_{i})$ .

8 Simulation Studies

8.1 Study I for SPD Matrices

In the following studies the structural dimension $d$ is known unless otherwise specified. We test the performance of our proposed iMAVE with log-Euclidean metric (eu-iMAVE), iOPG with log-Euclidean metric (eu-iOPG), iMAVE with log-Cholesky metric (ch-iMAVE), iOPG with log-Cholesky metric (ch-iOPG), weighted inverse regression ensemble method (WIRE, Ying and Yu (2022)), Fréchet MAVE and Fréchet OPG (fMAVE and fOPG, Zhang et al. (2021)).

According to Schwartzman (2006), $Z\in\mathrm{Sym}(m)$ is said to obey the standard symmetric matrix variate Normal distribution $N_{mm}(0,I_{m})$ if $Z$ has independent $N(0,1)$ diagonal elements and independent $N(0,1/2)$ off-diagonal elements. $Y\in\mathrm{Sym}(m)$ is said to obey the symmetric matrix variate Normal distribution $N_{mm}(M,\Sigma)$ if $Y=M+GZG^{T}$ where $M\in\mathrm{Sym}(m)$ and $\Sigma=G^{T}G$ . As a special case, we say $Y\in\mathrm{Sym}(m)\sim N_{mm}(M,\sigma^{2})$ if $Y=M+\sigma Z$ .

Let $\beta_{1}^{T}=(1,1,0,...,0)/\sqrt{2}$ , $\beta_{2}^{T}=(0,...0,1,1)/\sqrt{2}$ . The predictors $X_{1},X_{2},...,X_{p}$ are independent random variables each from the uniform distribution on $[0,1]$ . We generate $n$ i.i.d samples $(X_{1i},X_{2i},...,X_{pi})$ $(i=1,...,n)$ . Let $M(X)$ be matrices specified by the following models:

I-1: $M(X)=\left(\begin{array}[]{cc}1&\rho(X)\\ \rho(X)&1\end{array}\right)$ , where $\rho(X)=\{\mathrm{exp}(\beta_{1}^{T}X)-1\}/\{\mathrm{exp}(\beta_{1}^{T}X)+1\}$ ;

I-2: $M(X)=\left(\begin{array}[]{ccccc}1&\rho_{1}(X)&\rho_{1}(X)&\rho_{2}(X)&\rho_{2}(X)\\ \rho_{1}(X)&1&\rho_{2}(X)&\rho_{2}(X)&\rho_{2}(X)\\ \rho_{1}(X)&\rho_{2}(X)&1&\rho_{2}(X)&\rho_{1}(X)\\ \rho_{2}(X)&\rho_{2}(X)&\rho_{2}(X)&1&\rho_{1}(X)\\ \rho_{2}(X)&\rho_{2}(X)&\rho_{1}(X)&\rho_{1}(X)&1\end{array}\right)$ ,

where $\rho_{1}(X)=0.2\{\mathrm{exp}(\beta_{1}^{T}X)-1\}/\{\mathrm{exp}(\beta_{1}^{T}X)+1\}$ and $\rho_{2}(X)=0.2\sin(\beta_{2}^{T}X)$ .

We generate $\mathrm{log}(Y)\sim N_{mm}(\mathrm{log}\{M(X)\},\sigma^{2})$ . That is, $Y=\mathrm{exp}[\mathrm{log}\{M(X)\}+\sigma Z]$ . In model I-1, $m=2$ , $B_{0}=\beta_{1}$ and $d=1$ . In model I-2, $m=5$ , $B_{0}=(\beta_{1},\beta_{2})$ and $d=2$ . In above settings $M(X)$ is not necessarily the Fréchet mean of $Y$ given $X$ , but still measures the concentration tendency of the conditional distribution $Y\mid X$ . Model I-1,I-2 are also considered in Zhang et al. (2021). The kernel function in iOPG and iMAVE is $K(v^{2})=15/16(1-v^{2})^{2}I(v^{2}<1)$ . In WIRE, we adopt the distance function induced by the log-Euclidean metric to compute the distance matrix. We follow the same steps described in Zhang et al. (2021) to prepare fOPG and fMAVE for the following simulations. For each model, we take $\sigma=0.2$ and $(p,n)=(10,100),(10,200),(20,100),(20,200)$ . The experiments in each scenario was repeated 100 times and the means and standard deviations of the estimation errors are listed in Table 1. The results for $\sigma=0.1$ are presented in the supplementary material.

It is obvious that the best performer is always iOPG or iMAVE with either the log-Euclidean or the log-Cholesky metric. This result is reasonable since WIRE and fOPG, fMAVE make use of the information hidden in $Y$ by calculating the distance matrix $(d(Y_{i},Y_{j}))_{ij}$ or the kernel matrix $(k(Y_{i},Y_{j}))_{ij}$ , both of which fail to fully exploit the inner structure of $Y$ . On the contrary, our methods are intrinsic and respect the geometric structure of $Y$ , thus generating more satisfying results.

8.2 Study II for SPD Matrices

In this simulation study, we generate $Y$ similar to Lin et al. (2022). Let the predictors $X_{1},X_{2},...,X_{p}$ be independently and identically sampled from the uniform distribution on $[0,1]$ . Fix $\mu$ to be the identity matrix. Set $Y=\mu\oplus w(X_{1},...,X_{p})\oplus\zeta$ , where $w(X_{1},...,X_{p})=\mathfrak{exp}\phi_{\mu,e}f(X_{1},...,X_{p})$ with the following two settings for $f$ :

II-1: $f(X_{1},...,X_{p})=f_{12}(X_{1},X_{2})$ , where $f_{12}(X_{1},X_{2})$ is an $m\times m$ matrix with $(j,l)$ -entry being $\mathrm{exp}\{-1/|j-l|\}\sin[2\pi\{X_{1}+X_{2}-1/(j+l)\}]$ ;

II-2: $f(X_{1},...,X_{p})=\sum_{k=1}^{2}f_{k}(X_{k})$ where $f_{k}(X_{k})$ is an $m\times m$ matrix with $(j,l)$ -entry being $\mathrm{exp}\{-1/|j-l|\}\sin[2\pi\{X_{k}-1/(j+l)\}]$ .

The setting II-2 is the manifold additive model proposed by Lin et al. (2022) and II-1 is a modification. We set $m=3$ . The random noise $\zeta$ is generated according to $\mathfrak{log}\zeta=\sum_{i=1}^{6}Z_{j}v_{j}$ , where $Z_{1},...,Z_{6}$ are independently sampled form $N(0,0.1^{2})$ and $v_{1},...,v_{6}$ are an basis of the tangent space $T_{e}\mathrm{Sym}^{+}(m)$ . Note that $\mu$ is identical with $e$ so $\phi_{\mu,e}$ is just the identity map. We adopt the log-Euclidean metric so that $\mathfrak{exp}=\mathrm{exp}$ and $\mathfrak{log}=\mathrm{log}$ . In model II-1, $d=1$ and $B_{0}=(1,1,0,...,0)^{T}$ ; in model II-2, $d=2$ and $B_{0}=(\beta_{1},\beta_{2})^{T}$ , where $\beta_{1}=(1,0,...,0)^{T}$ and $\beta_{2}=(0,1,0,...,0)^{T}$ . We take $(p,n)=(5,100),(5,200),(10,100),(10,200)$ . Following Wang et al. (2013) , we in this study adopt the multi-dimensional Gaussian kernel $k(u)=\mathrm{exp}(-||u||^{2}/2)$ with the bandwidth $h$ set to be $h=\{4/(p+2)\}^{1/(p+4)}n^{-1/(d+4)}$ and $p$ being the dimension of $u$ . The means and standard deviations of the estimation errors are summarized in Table 2.

Model II-1, II-2 are tough tasks, in each scenario all methods except ours fail to give reasonable estimates even when the dimension $p=5$ is not large at all. Our methods can give accurate estimates on most occasions. When the dimension is relatively large ( $p=10$ ) and the sample size is not large enough ( $n=100$ ), our methods cannot always produce satisfying estimates and may fail. In Fig. 1, we draw the box plots of the estimation errors based on 100 replications of all methods for $(p,n)=(10,100),(10,200)$ in II-1 and II-2. First we can see that WIRE, fOPG and fMAVE fail in all scenarios. When the sample size is not large enough ( $p=100$ ), iOPG or iMAVE still has a possibility to fail even if the median of estimation errors is small and stable. See the top left box plot in Fig. 1. The case II-2 is easier than II-1 for our models, with much less wrong estimates (the bottom two plots). When the sample size increases to 200, our methods improve themselves and give accurate estimates in every replication in II-2, while no obvious improvement is observed for other methods. It can be expected for our methods to produce more accurate estimates if the sample size is large enough.

8.3 Study III for Sphere Data

Since the proposed iMAVE and iOPG can be extended to general manifolds, we in this part test the performance of models derived from model (14). We generate $Y\in S^{2}$ according to the following model:

III: Let $p_{0}=(0,0,1)^{T}$ and the tangent vector at $p_{0}$ be

[TABLE]

We generate i.i.d. observations $X_{1},...,X_{n}$ from the uniform distribution on $[-1,1]$ and i.i.d. $\epsilon_{i1},\epsilon_{i2}\sim N(0,0.1^{2})$ . Then $Y_{i}$ is generated by

[TABLE]

where $||\cdot||$ is the Euclidean norm.

The simulation results under several scenarios are listed in Table 3. The proposed iMAVE and iOPG always perform better than others, with iMAVE producing the smallest estimation errors.

8.4 Study VI: Determine the Structural Dimension

In this part we assume that we have no knowledge about the dimension of the mean dimension reduction space and need to estimate it. We generate data from the five models in Study I, II and III and use the CV procedure to estimate $d$ . In the CV procedure, we use iOPG to estimate $B$ . We set $p=10$ , $n=200$ and repeat 100 times for each model and list the counts of correct and false estimates in 100 times when $\sigma=0.1$ and $0.2$ , which is shown in Fig. 2.

Except model II-1 with $\sigma=0.2$ , our CV procedure always gives satisfying estimations, reaching an accuracy greater than $80\%$ and even approaching $100\%$ in some cases. And if we increase the sample size to 300, the result corresponding to model II-1 with $\sigma=0.2$ becomes: $(\hat{d}<d):0$ , $(\hat{d}=d):92$ , $(\hat{d}>d):8$ . Such improvement validates Theorem 7.1.

9 Application to New York Taxi Network Data

In this section, we apply our proposed methods to the New York Taxi network data. We first estimate the structural dimension as $\hat{d}$ and apply iMAVE equipped with the log-Euclidean metric to derive estimated $\hat{B}=(\hat{\beta}_{1},...,\hat{\beta}_{\hat{d}})$ on the training dataset. Then we feed our results to the manifold additive regression model (Lin et al., 2022) and get the prediction root mean squared error (RMSE) on the testing dataset. Small RMSE will justify the validity of our methods.

The New York City Taxi and Limousine Commission provides records on pick-up and drop-off dates and times, pick-up and drop-off locations, trip distances, itemized fares, payment types and other information for yellow taxis (Tucker et al., 2021). The data are available from

https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

Similar to Tucker et al. (2021), we transform raw data into network data (adjacent matrices), where zones are nodes and edges are weighted by the number of taxi rides which picked up in one zone and dropped off in another within a single hour. After proper mapping, these adjacent matrices lie in the space of SPD matrices. We do the following to collect SPD matrices together with several prediction variables:

We only choose the data of January and February, 2019 (59 days) due to resource restrictions.
We filter on observations with both pick-up and drop-off occurring in Manhattan (islands excluded).
We then group zones in Manhattan into 3 zones and label them similar to Dubey and Müller (2020). That is, each network has 3 nodes.
For each hour, we collected the number of pairwise connections between nodes based on pick-ups and drop-offs. These correspond to weights between nodes. We then further normalize the weights by the maximum edge weight in each hour so that they lie in $[0,1]$ .

By doing so, we collected 1416 (59 $\times$ 24) weighted adjacent matrices of $3\times 3$ describing the taxi movements between zones in Manhattan. To ensure that they are SPD matrices, we apply $\mathrm{exp}(\cdot)$ to these symmetric matrices.

From the dataset we collect the following 9 potential predictors, with values averaged over each hour:

Ave.Distance: mean distance traveled, standardized

Ave.Fare: mean total fare, standardized

Ave.Passengers: mean number of passengers, standardized

Ave.tip: mean tip, standardized

Cash: sum of cash indicators for type of payment, standardized

Credit: sum of credit indicators for type of payment, standardized

Dispute: sum of dispute indicators for type of payment, standardized

Free: sum of free indicators for type of payment, standardized

LateHour: indicator for the hour being between 11pm and 5am

We also collect New York City weather data for January and February 2019 from

https://www.wunderground.com/history/daily/us/ny/new-york-city/KLGA/date

The following 5 weather variables are included as potential predictors:

Ave.temp: daily mean temperature, standardized

Ave.humid: daily mean humidity, standardized

Ave.wind: daily mean wind speed, standardized

Ave.press: daily mean barometric pressure, standardized

Precip: daily total precipitation, standardized

This then yields a total of 14 potential predictors. We can now write the data at hand as $\{Y,X_{n\times p}\}$ , where $Y$ is an array of dimension $3\times 3\times n$ , $n=1416$ , $p=14$ and $Y[,,i]$ is a $3\times 3$ SPD matrix $(i=1,...,n)$ . Then we randomly divide the dataset into a train set (991 samples) and a test dataset (425 samples). On the train set, we respectively set $d=1,...,7$ , apply iMAVE with the log-Euclidean metric and calculate CV( $d$ ). The results are: 0.0430, 0.0283, 0.0257, 0.0626, 0.0834, 0.0687, 0.0612. The CV procedure suggests that $\hat{d}=3$ is a reasonable choice. So we apply iMAVE with $d=3$ again to the training dataset and get $\hat{B}$ which is listed in Table 4.

The estimated results show that fare amount and type of payment are important covariates, which is consistent with the results of Tucker et al. (2021). Ave.Fare and Ave.Distance are closely related and both of them are significant in the first three directions. Cash and Credit are significant in the first direction, showing that most passengers tend to pay the fare by cash or credit. Another obvious observation is that all the 5 weather variables seem negligible since their coefficients are almost 0 in all of the first three directions. This is reasonable because as a global metropolitan, the New York City has established an advanced and robust public transportation system. And mild weather changes may have little compact on the function of taxi services. The weather condition during January and February 2019 is rather stationary, which accounts for the insignificance of weather variables.

To show our dimension reduction method is valid and has further statistical applications, we conduct the additive regression using the manifold additive model (MAM) introduced by Lin et al. (2022). The MAM is formulated as

[TABLE]

where $Y$ is an SPD matrix, $\mu$ is the Fréchet mean of $Y$ , each $w_{k}$ is function mapping $X_{k}$ into the SPD space, $\zeta$ is random noise which has a Fréchet mean corresponding to the group identity element, $X_{i}$ $(i=1,...,q)$ are scalar variables and $\oplus$ is the group operation.

We apply MAM to the train dataset after dimension reduction $\{Y^{\mathrm{train}},X^{\mathrm{train}}\hat{B}\}$ to get estimated $\hat{\mu}$ and functions $\hat{w}_{1}$ , $\hat{w}_{2}$ and $\hat{w}_{3}$ . Then we apply the trained MAM to the test dataset $\{Y^{\mathrm{test}},X^{\mathrm{test}}\hat{B}\}$ to get the estimates $\hat{Y}^{\mathrm{test}}$ . The prediction RMSE on the test dataset is 0.3220, which is a relative small number as the prediction error of a $3\times 3$ SPD. That is to say, MAM generates good estimation after processing data with our intrinsic dimension reduction method, which indicates that our method is valid and possesses the potential for widely applications.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arsigny et al., (2007) Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. (2007). Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications , 29:328–347.
2Batchelor et al., (2004) Batchelor, P. G., Moakher, M., Atkinson, D., Calamante, F., and Connelly, A. (2004). A rigorous framework for diffusion tensor calculus. Magnetic Resonance in Medicine , 53:221–225.
3Bhattacharjee and Müller, (2021) Bhattacharjee, S. and Müller, H.-G. (2021). Single index Fréchet regression. ar Xiv:2108.05437 [stat.ME].
4Chen et al., (2020) Chen, Y., Lin, Z., and Müller, H.-G. (2020). Wasserstein regression. ar Xiv:2006.09660 [stat.ME].
5Cook and Li, (2002) Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression. The Annals of Statistics , 30:455–474.
6Cook and Weisberg, (1991) Cook, R. D. and Weisberg, S. (1991). Sliced inverse regression for dimension reduction: Comment. Journal of the American Statistical Association , 86:328–332.
7Cornea et al., (2016) Cornea, E., Zhu, H., Kim, P., and Ibrahim, J. G. (2016). Regression models on Riemannian symmetric spaces. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 79:463–482.
8Dubey and Müller, (2020) Dubey, P. and Müller, H.-G. (2020). Functional models for time-varying random objects. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 82:275–327.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Intrinsic Minimum Average Variance Estimation for Sufficient Dimension Reduction with Symmetric Positive

Abstract

1 Introduction

2 Preliminaries on Manifolds

3 Intrinsic MAVE and OPG for SPD Matrices

4 Algorithms under the Log-Euclidean Metric

5 Extension to General Riemannian Manifolds

6 Asymptotic Results

Theorem 6.1**.**

Theorem 6.2**.**

Corollary 6.3**.**

Corollary 6.4**.**

7 Determine the Structural Dimension

Theorem 7.1**.**

8 Simulation Studies

8.1 Study I for SPD Matrices

8.2 Study II for SPD Matrices

8.3 Study III for Sphere Data

8.4 Study VI: Determine the Structural Dimension

9 Application to New York Taxi Network Data

Theorem 6.1.

Theorem 6.2.

Corollary 6.3.

Corollary 6.4.

Theorem 7.1.