Finding Linear Structure in Large Datasets with Scalable Canonical   Correlation Analysis

Zhuang Ma; Yichao Lu; Dean Foster

arXiv:1506.08170·stat.ML·June 29, 2015·ICML·36 cites

Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis

Zhuang Ma, Yichao Lu, Dean Foster

PDF

Open Access

TL;DR

This paper introduces a scalable, memory-efficient stochastic algorithm for large-scale canonical correlation analysis (CCA), enabling effective analysis of multi-view datasets with reduced computational and storage costs.

Contribution

It proposes the AppGrad scheme, a novel stochastic method for CCA that is scalable, memory-efficient, and suitable for streaming data, addressing limitations of classical algorithms.

Findings

01

AppGrad achieves optimal storage complexity.

02

The method is effective on large real datasets.

03

It is the first stochastic algorithm for CCA.

Abstract

Canonical Correlation Analysis (CCA) is a widely used spectral technique for finding correlation structures in multi-view datasets. In this paper, we tackle the problem of large scale CCA, where classical algorithms, usually requiring computing the product of two huge matrices and huge matrix decomposition, are computationally and storage expensive. We recast CCA from a novel perspective and propose a scalable and memory efficient Augmented Approximate Gradient (AppGrad) scheme for finding top $k$ dimensional canonical subspace which only involves large matrix multiplying a thin matrix of width $k$ and small matrix decomposition of dimension $k \times k$ . Further, AppGrad achieves optimal storage complexity $O (k (p_{1} + p_{2}))$ , compared with classical algorithms which usually require $O (p_{1}^{2} + p_{2}^{2})$ space to store two dense whitening matrices. The proposed scheme naturally generalizes to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Face and Expression Recognition