When Tukey meets Chauvenet: a new boxplot criterion for outlier detection
Hongmei Lin, Riquan Zhang, Tiejun Tong

TL;DR
This paper introduces a new outlier detection method called the Chauvenet-type boxplot, combining Tukey's boxplot and Chauvenet's criterion, which is robust, easy to implement, and effective across different sample sizes.
Contribution
The paper proposes a novel outlier detection criterion that integrates Tukey's boxplot with Chauvenet's rule, providing a robust and practical alternative for outlier identification.
Findings
Performs well regardless of sample size
Maintains simplicity and robustness in outlier detection
Outperforms traditional methods in simulations and real data
Abstract
The box-and-whisker plot, introduced by Tukey (1977), is one of the most popular graphical methods in descriptive statistics. On the other hand, however, Tukey's boxplot is free of sample size, yielding the so-called "one-size-fits-all" fences for outlier detection. Although improvements on the sample size adjusted boxplots do exist in the literature, most of them are either not easy to implement or lack justification. As another common rule for outlier detection, Chauvenet's criterion uses the sample mean and standard derivation to perform the test, but it is often sensitive to the included outliers and hence is not robust. In this paper, by combining Tukey's boxplot and Chauvenet's criterion, we introduce a new boxplot, namely the Chauvenet-type boxplot, with the fence coefficient determined by an exact control of the outside rate per observation. Our new outlier criterion not only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Anomaly Detection Techniques and Applications · Data Analysis with R
