Significance testing for canonical correlation analysis in high dimensions
Ian W. McKeague, Xin Zhang

TL;DR
This paper develops a statistically rigorous method for testing the presence of linear relationships in high-dimensional data using canonical correlation analysis, accounting for variable selection and providing reliable inference tools.
Contribution
It introduces a stabilized one-step estimator and a greedy algorithm for a computationally feasible omnibus test in high dimensions, advancing post-selection inference in CCA.
Findings
Estimator is consistent and asymptotically normal under certain conditions.
Proposed test accurately detects linear relationships in high-dimensional settings.
Confidence intervals account for variable selection, improving inference reliability.
Abstract
We consider the problem of testing for the presence of linear relationships between large sets of random variables based on a post-selection inference approach to canonical correlation analysis. The challenge is to adjust for the selection of subsets of variables having linear combinations with maximal sample correlation. To this end, we construct a stabilized one-step estimator of the euclidean-norm of the canonical correlations maximized over subsets of variables of pre-specified cardinality. This estimator is shown to be consistent for its target parameter and asymptotically normal provided the dimensions of the variables do not grow too quickly with sample size. We also develop a greedy search algorithm to accurately compute the estimator, leading to a computationally tractable omnibus test for the global null hypothesis that there are no linear relationships between any subsets of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods in Clinical Trials · Statistical Methods and Bayesian Inference
