Propagation of outliers in multivariate data
Fatemah Alqallaf, Stefan Van Aelst, Victor J. Yohai, Ruben H. Zamar

TL;DR
This paper studies how outliers spread in multivariate data when using robust location estimators, revealing that standard high-breakdown estimators can propagate outliers and perform poorly under componentwise contamination, especially in high dimensions.
Contribution
It introduces the concept of propagation of outliers in multivariate data and analyzes the influence function of robust estimators under componentwise contamination models.
Findings
Standard high-breakdown estimators propagate outliers in high dimensions.
Propagation of outliers is a data processing error occurring after data collection.
Robust estimators show poor breakdown behavior under componentwise contamination.
Abstract
We investigate the performance of robust estimates of multivariate location under nonstandard data contamination models such as componentwise outliers (i.e., contamination in each variable is independent from the other variables). This model brings up a possible new source of statistical error that we call "propagation of outliers." This source of error is unusual in the sense that it is generated by the data processing itself and takes place after the data has been collected. We define and derive the influence function of robust multivariate location estimates under flexible contamination models and use it to investigate the effect of propagation of outliers. Furthermore, we show that standard high-breakdown affine equivariant estimators propagate outliers and therefore show poor breakdown behavior under componentwise contamination when the dimension is high.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
