Spectral analysis of high-dimensional sample covariance matrices with missing observations
Kamil Jurczak, Angelika Rohde

TL;DR
This paper analyzes the spectral properties of high-dimensional sample covariance matrices with missing data, deriving the limiting spectral distribution and conditions for positive definiteness under various missingness mechanisms.
Contribution
It provides a new spectral analysis framework for covariance matrices with missing observations, including explicit formulas for the limiting distribution and eigenvalue behavior.
Findings
Limiting spectral distribution is a shifted Marčenko-Pastur law under missing at random.
Eigenvalues converge to the distribution's boundary points as dimension and sample size grow.
Sample covariance matrix is positive definite if observation probability exceeds a specific threshold.
Abstract
We study high-dimensional sample covariance matrices based on independent random vectors with missing coordinates. The presence of missing observations is common in modern applications such as climate studies or gene expression micro-arrays. A weak approximation on the spectral distribution in the "large dimension and large sample size " asymptotics is derived for possibly different observation probabilities in the coordinates. The spectral distribution turns out to be strongly influenced by the missingness mechanism. In the null case under the missing at random scenario where each component is observed with the same probability , the limiting spectral distribution is a Mar\v{c}enko-Pastur law shifted by to the left. As , the almost sure convergence of the extremal eigenvalues to the respective boundary points of the support of the limiting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
