# Asymptotic performance of PCA for high-dimensional heteroscedastic data

**Authors:** David Hong, Laura Balzano, Jeffrey A. Fessler

arXiv: 1703.06610 · 2019-06-14

## TL;DR

This paper provides an asymptotic analysis of PCA's performance on high-dimensional heteroscedastic data, revealing that heteroscedastic noise degrades PCA recovery compared to homoscedastic noise, even with the same average noise level.

## Contribution

It derives simplified asymptotic expressions for PCA performance on heteroscedastic data and demonstrates that heteroscedastic noise worsens PCA recovery relative to homoscedastic noise.

## Key findings

- Heteroscedastic noise reduces PCA recovery performance compared to homoscedastic noise.
- Simplified formulas enable easy assessment of PCA performance in high-dimensional settings.
- Average noise variance overestimates PCA effectiveness for heteroscedastic data.

## Abstract

Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional and heteroscedastic. This paper analyzes the statistical performance of PCA in this setting, i.e., for high-dimensional data drawn from a low-dimensional subspace and degraded by heteroscedastic noise. We provide simplified expressions for the asymptotic PCA recovery of the underlying subspace, subspace amplitudes and subspace coefficients; the expressions enable both easy and efficient calculation and reasoning about the performance of PCA. We exploit the structure of these expressions to show that, for a fixed average noise variance, the asymptotic recovery of PCA for heteroscedastic data is always worse than that for homoscedastic data (i.e., for noise variances that are equal across samples). Hence, while average noise variance is often a practically convenient measure for the overall quality of data, it gives an overly optimistic estimate of the performance of PCA for heteroscedastic data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.06610/full.md

## Figures

52 figures with captions in the complete paper: https://tomesphere.com/paper/1703.06610/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/1703.06610/full.md

---
Source: https://tomesphere.com/paper/1703.06610