High-Dimensional Canonical Correlation Analysis
Anna Bykhovskaya, Vadim Gorin

TL;DR
This paper investigates the limitations of classical high-dimensional canonical correlation analysis (CCA) when data dimensions grow large, revealing fundamental impossibility results and proposing error assessment methods.
Contribution
It provides the first theoretical result on the non-identifiability of canonical variables in high-dimensional CCA and offers practical error bounds for estimation accuracy.
Findings
Classical CCA fails to consistently estimate canonical vectors in high dimensions.
Derived the magnitude of estimation error for high-dimensional CCA.
Applied results to stock data and ecological datasets.
Abstract
This paper studies high-dimensional canonical correlation analysis (CCA) with an emphasis on the vectors that define canonical variables. The paper shows that when two dimensions of data grow to infinity jointly and proportionally, the classical CCA procedure for estimating those vectors fails to deliver a consistent estimate. This provides the first result on the impossibility of identification of canonical variables in the CCA procedure when all dimensions are large. As a countermeasure, the paper derives the magnitude of the estimation error, which can be used in practice to assess the precision of CCA estimates. Applications of the results to cyclical vs. non-cyclical stocks and to a limestone grassland data set are provided.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Chemometric Analyses · Genetic and phenotypic traits in livestock · Genetics and Plant Breeding
