Combining Structured and Unstructured Randomness in Large Scale PCA
Nikos Karampatziakis, Paul Mineiro

TL;DR
This paper introduces an efficient PCA algorithm that combines structured and unstructured random projections to handle large datasets effectively, demonstrated on a KDD Cup winning submission.
Contribution
It presents a novel PCA method that integrates structured and unstructured randomness for improved efficiency and accuracy on large-scale data.
Findings
Effective in large-scale PCA tasks
Retains accuracy with computational efficiency
Successfully applied to KDD 2010 Cup data
Abstract
Principal Component Analysis (PCA) is a ubiquitous tool with many applications in machine learning including feature construction, subspace embedding, and outlier detection. In this paper, we present an algorithm for computing the top principal components of a dataset with a large number of rows (examples) and columns (features). Our algorithm leverages both structured and unstructured random projections to retain good accuracy while being computationally efficient. We demonstrate the technique on the winning submission the KDD 2010 Cup.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Sparse and Compressive Sensing Techniques · Blind Source Separation Techniques
