Fast robust correlation for high-dimensional data
Jakob Raymaekers, Peter J. Rousseeuw

TL;DR
This paper introduces a fast, robust method for estimating covariance in high-dimensional data, addressing outlier sensitivity and computational challenges, with applications to genomic and video data.
Contribution
The authors develop a simple, scalable approach for robust covariance estimation using data transformations, suitable for ultrahigh-dimensional data, and improve outlier detection methods.
Findings
Method performs well in robustness and accuracy
Applicable to data with thousands to hundreds of thousands of variables
Demonstrated on genomic and video datasets
Abstract
The product moment covariance is a cornerstone of multivariate data analysis, from which one can derive correlations, principal components, Mahalanobis distances and many other results. Unfortunately the product moment covariance and the corresponding Pearson correlation are very susceptible to outliers (anomalies) in the data. Several robust measures of covariance have been developed, but few are suitable for the ultrahigh dimensional data that are becoming more prevalent nowadays. For that one needs methods whose computation scales well with the dimension, are guaranteed to yield a positive semidefinite covariance matrix, and are sufficiently robust to outliers as well as sufficiently accurate in the statistical sense of low variability. We construct such methods using data transformations. The resulting approach is simple, fast and widely applicable. We study its robustness by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
