Combining Structured and Unstructured Randomness in Large Scale PCA

Nikos Karampatziakis; Paul Mineiro

arXiv:1310.6304·cs.LG·October 25, 2013·2 cites

Combining Structured and Unstructured Randomness in Large Scale PCA

Nikos Karampatziakis, Paul Mineiro

PDF

Open Access

TL;DR

This paper introduces an efficient PCA algorithm that combines structured and unstructured random projections to handle large datasets effectively, demonstrated on a KDD Cup winning submission.

Contribution

It presents a novel PCA method that integrates structured and unstructured randomness for improved efficiency and accuracy on large-scale data.

Findings

01

Effective in large-scale PCA tasks

02

Retains accuracy with computational efficiency

03

Successfully applied to KDD 2010 Cup data

Abstract

Principal Component Analysis (PCA) is a ubiquitous tool with many applications in machine learning including feature construction, subspace embedding, and outlier detection. In this paper, we present an algorithm for computing the top principal components of a dataset with a large number of rows (examples) and columns (features). Our algorithm leverages both structured and unstructured random projections to retain good accuracy while being computationally efficient. We demonstrate the technique on the winning submission the KDD 2010 Cup.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Sparse and Compressive Sensing Techniques · Blind Source Separation Techniques