Quantifying the Estimation Error of Principal Components

Raphael Hauser; Raul Kangro; J\"uri Lember; Heinrich Matzinger

arXiv:1710.10124·math.ST·October 30, 2017·1 cites

Quantifying the Estimation Error of Principal Components

Raphael Hauser, Raul Kangro, J\"uri Lember, Heinrich Matzinger

PDF

Open Access

TL;DR

This paper improves bounds on the estimation error of principal components in PCA, showing that eigenvectors can often be accurately reconstructed from fewer samples than previously thought.

Contribution

It sharpens existing bounds on PCA eigenvector estimation error and demonstrates that accurate reconstruction is possible with smaller sample sizes.

Findings

01

Sharper bounds on eigenvector estimation error

02

Eigenvectors can be reconstructed with fewer samples

03

Improved understanding of PCA sample complexity

Abstract

Principal component analysis is an important pattern recognition and dimensionality reduction tool in many applications. Principal components are computed as eigenvectors of a maximum likelihood covariance $Σ$ that approximates a population covariance $Σ$ , and these eigenvectors are often used to extract structural information about the variables (or attributes) of the studied population. Since PCA is based on the eigendecomposition of the proxy covariance $Σ$ rather than the ground-truth $Σ$ , it is important to understand the approximation error in each individual eigenvector as a function of the number of available samples. The recent results of Kolchinskii and Lounici yield such bounds. In the present paper we sharpen these bounds and show that eigenvectors can often be reconstructed to a required accuracy from a sample of strictly smaller size…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Gene expression and cancer classification · Blind Source Separation Techniques