Robust Chauvenet Rejection: Powerful, but Easy to Use Outlier Detection   for Heavily Contaminated Data Sets

Nicholas Konz; Daniel E. Reichart

arXiv:2301.07838·stat.CO·January 20, 2023

Robust Chauvenet Rejection: Powerful, but Easy to Use Outlier Detection for Heavily Contaminated Data Sets

Nicholas Konz, Daniel E. Reichart

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Python implementation of Robust Chauvenet Rejection (RCR), an effective outlier detection method for heavily contaminated data sets, demonstrating its accuracy, speed, and versatility in one- and multi-dimensional contexts.

Contribution

The paper presents a new Python package for RCR, enhancing its accessibility and usability while maintaining its high performance for outlier rejection in contaminated data.

Findings

01

RCR effectively cleans heavily contaminated data sets.

02

The Python implementation maintains the speed of the original C++ version.

03

RCR performs well in both one-dimensional and multi-dimensional data analysis.

Abstract

In Maples et al. (2018) we introduced Robust Chauvenet Outlier Rejection, or RCR, a novel outlier rejection technique that evolves Chauvenet's Criterion by sequentially applying different measures of central tendency and empirically determining the rejective sigma value. RCR is especially powerful for cleaning heavily-contaminated samples, and unlike other methods such as sigma clipping, it manages to be both accurate and precise when characterizing the underlying uncontaminated distributions of data sets, by using decreasingly robust but increasingly precise statistics in sequence. For this work, we present RCR from a software standpoint, newly implemented as a Python package while maintaining the speed of the C++ original. RCR has been well-tested, calibrated and simulated, and it can be used for both one-dimensional outlier rejection and $n$ -dimensional model-fitting, with or without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nickk124/RCR
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification