Stochastic Canonical Correlation Analysis
Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang

TL;DR
This paper analyzes the sample complexity of canonical correlation analysis (CCA), providing bounds on the number of samples needed for accurate estimation and proposing efficient algorithms with theoretical guarantees.
Contribution
It introduces sample complexity bounds for CCA under mild assumptions and develops stochastic and streaming algorithms with provable convergence guarantees.
Findings
Exact empirical solution requires N(ε, Δ, γ) samples
Stochastic optimization achieves the same accuracy with O(log(1/ε)) passes
Streaming algorithms match sample complexity with single-pass data processing
Abstract
We study the sample complexity of canonical correlation analysis (CCA), \ie, the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error. With mild assumptions on the data distribution, we show that in order to achieve -suboptimality in a properly defined measure of alignment between the estimated canonical directions and the population solution, we can solve the empirical objective exactly with samples, where is the singular value gap of the whitened cross-covariance matrix and is an upper bound of the condition number of auto-covariance matrices. Moreover, we can achieve the same learning accuracy by drawing the same level of samples and solving the empirical objective approximately with a stochastic optimization algorithm; this algorithm is based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques
