Multiple Outliers in Small Samples

Mark Chamness; Rachel Traylor

arXiv:1601.07521·math.ST·March 15, 2016

Multiple Outliers in Small Samples

Mark Chamness, Rachel Traylor

PDF

Open Access

TL;DR

This paper investigates the limitations of using z-scores for outlier detection in small samples with multiple outliers, revealing a masking effect that hampers accurate identification.

Contribution

It provides a closed-form expression for the maximum z-score in small samples with multiple outliers and analyzes the related t-statistic, highlighting detection challenges.

Findings

01

Maximum z-score decreases as outliers increase

02

Masking effect impairs outlier detection in small samples

03

Closed-form formula for maximum z-score and t-statistic

Abstract

Z-scores are often employed in outlier detection in a dataset. For small samples, the presence of multiple outliers forces a finite supremum on the absolute value of possible z-scores that decreases with an increasing number of outliers, creating a "masking effect" that hinders identification of true outliers. We give an illustrative case study in which the accurate detection of the number of outliers is critical, and provide a closed form expression of the maximum possible z-score in terms of the sample size and number of outliers. In addition, a corresponding analysis on the $t -$ statistic is performed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Anomaly Detection Techniques and Applications · Fuzzy Systems and Optimization