Number of relevant directions in Principal Component Analysis and Wishart random matrices
Satya N. Majumdar, Pierpaolo Vivo

TL;DR
This paper analytically computes the probability distribution of the number of eigenvalues exceeding a threshold in Wishart matrices, revealing phase transitions and universal variance growth, which are crucial for understanding PCA in large datasets.
Contribution
It provides explicit formulas for large deviation probabilities and variance of relevant eigenvalues in Wishart matrices, highlighting phase transitions in Coulomb gas models.
Findings
Probability of eigenvalues exceeding threshold follows a large deviation form.
Variance of relevant eigenvalues grows logarithmically with matrix size.
Phase transition in Coulomb gas explains behavior of eigenvalue distribution.
Abstract
We compute analytically, for large , the probability that a Wishart random matrix has eigenvalues exceeding a threshold , including its large deviation tails. This probability plays a benchmark role when performing the Principal Component Analysis of a large empirical dataset. We find that , where is the Dyson index of the ensemble and is a rate function that we compute explicitly in the full range and for any . The rate function displays a quadratic behavior modulated by a logarithmic singularity close to its minimum . This is shown to be a consequence of a phase transition in an associated Coulomb gas problem. The variance of the number of relevant components is also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Molecular spectroscopy and chirality · Blind Source Separation Techniques
