Doubts on the efficacy of outliers correction methods
Marjorie Fonnesu, Nicola Kuczewski

TL;DR
This paper demonstrates that common outlier correction methods can worsen statistical inference when no outliers are present, advocating for non-parametric tests as a safer alternative.
Contribution
The study critically evaluates popular outlier correction methods, revealing their potential to harm inference accuracy in the absence of outliers, and recommends non-parametric tests.
Findings
Outlier correction methods inflate Type I error when no outliers are present.
Methods like 2 Sigma, 3 Sigma, MAD, IQR, Grubbs, and winsorizing increase false positives.
Non-parametric tests are safer for statistical comparisons without outlier correction.
Abstract
While the utilisation of different methods of outliers correction has been shown to counteract the inferential error produced by the presence of contaminating data not belonging to the studied population; the effects produced by their utilisation when samples do not contain contaminating outliers are less clear. Here a simulation approach shows that the most popular methods of outliers correction (2 Sigma, 3 Sigma, MAD, IQR, Grubbs and winsorizing) worsen the inferential evaluation of the studied population in this condition, in particular producing an inflation of Type I error and increasing the error committed in estimating the population mean and STD. We show that those methods that have the highest efficacy in counteract the inflation of Type I and Type II errors in the presence of contaminating outliers also produce the stronger increase of false positive results in their absence,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Statistical Methods and Inference
