Outlyingness: why do outliers lie out?

Michiel Debruyne; Sebastiaan H\"oppner; Sven Serneels; Tim Verdonck

arXiv:1708.03761·stat.ME·December 12, 2019·1 cites

Outlyingness: why do outliers lie out?

Michiel Debruyne, Sebastiaan H\"oppner, Sven Serneels, Tim Verdonck

PDF

Open Access

TL;DR

This paper introduces a fast method to identify which variables contribute most to an outlier's outlyingness, aiding scientific interpretation especially in high-dimensional data.

Contribution

It proposes a novel approach to detect variable contributions to outliers using sparse partial least squares regression, improving interpretability.

Findings

01

Method performs well on simulated data.

02

Effective in high-dimensional settings.

03

Helps understand why data points are flagged as outliers.

Abstract

Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier's outlyingness. Thereby, it helps the analyst understand why an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Anomaly Detection Techniques and Applications