Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis
Zhuang Ma, Yichao Lu, Dean Foster

TL;DR
This paper introduces a scalable, memory-efficient stochastic algorithm for large-scale canonical correlation analysis (CCA), enabling effective analysis of multi-view datasets with reduced computational and storage costs.
Contribution
It proposes the AppGrad scheme, a novel stochastic method for CCA that is scalable, memory-efficient, and suitable for streaming data, addressing limitations of classical algorithms.
Findings
AppGrad achieves optimal storage complexity.
The method is effective on large real datasets.
It is the first stochastic algorithm for CCA.
Abstract
Canonical Correlation Analysis (CCA) is a widely used spectral technique for finding correlation structures in multi-view datasets. In this paper, we tackle the problem of large scale CCA, where classical algorithms, usually requiring computing the product of two huge matrices and huge matrix decomposition, are computationally and storage expensive. We recast CCA from a novel perspective and propose a scalable and memory efficient Augmented Approximate Gradient (AppGrad) scheme for finding top dimensional canonical subspace which only involves large matrix multiplying a thin matrix of width and small matrix decomposition of dimension . Further, AppGrad achieves optimal storage complexity , compared with classical algorithms which usually require space to store two dense whitening matrices. The proposed scheme naturally generalizes to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Face and Expression Recognition
