On the Nystr\"om and Column-Sampling Methods for the Approximate   Principal Components Analysis of Large Data Sets

Darren Homrighausen; Daniel J. McDonald

arXiv:1602.01120·stat.ML·August 16, 2017

On the Nystr\"om and Column-Sampling Methods for the Approximate Principal Components Analysis of Large Data Sets

Darren Homrighausen, Daniel J. McDonald

PDF

TL;DR

This paper evaluates Nyström and column-sampling methods for approximate PCA on large datasets, analyzing their theoretical accuracy and practical efficiency through simulations and real data experiments.

Contribution

It provides a theoretical comparison and empirical assessment of these methods' effectiveness for large-scale PCA, clarifying their utility in statistical applications.

Findings

01

Theoretical bounds on subspace approximation error.

02

Trade-offs between accuracy and computational efficiency.

03

Empirical validation on real-world email data.

Abstract

In this paper we analyze approximate methods for undertaking a principal components analysis (PCA) on large data sets. PCA is a classical dimension reduction method that involves the projection of the data onto the subspace spanned by the leading eigenvectors of the covariance matrix. This projection can be used either for exploratory purposes or as an input for further analysis, e.g. regression. If the data have billions of entries or more, the computational and storage requirements for saving and manipulating the design matrix in fast memory is prohibitive. Recently, the Nystr\"om and column-sampling methods have appeared in the numerical linear algebra community for the randomized approximation of the singular value decomposition of large matrices. However, their utility for statistical applications remains unclear. We compare these approximations theoretically by bounding the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPrincipal Components Analysis