C-AllOut: Catching & Calling Outliers by Type

Guilherme D. F. Silva; Leman Akoglu; Robson L. F. Cordeiro

arXiv:2110.08257·cs.LG·October 19, 2021

C-AllOut: Catching & Calling Outliers by Type

Guilherme D. F. Silva, Leman Akoglu, Robson L. F. Cordeiro

PDF

Open Access

TL;DR

C-AllOut is a novel, scalable, parameter-free method that detects and classifies outliers into three types using only pairwise similarities, filling a gap in outlier annotation.

Contribution

The paper introduces C-AllOut, the first outlier detection method capable of annotating outliers by type using only pairwise similarities, with superior performance.

Findings

01

Achieves comparable or better detection performance than state-of-the-art methods.

02

Effectively annotates outliers by type, a task not addressed by existing methods.

03

Parameter-free and scalable to large datasets.

Abstract

Given an unlabeled dataset, wherein we have access only to pairwise similarities (or distances), how can we effectively (1) detect outliers, and (2) annotate/tag the outliers by type? Outlier detection has a large literature, yet we find a key gap in the field: to our knowledge, no existing work addresses the outlier annotation problem. Outliers are broadly classified into 3 types, representing distinct patterns that could be valuable to analysts: (a) global outliers are severe yet isolate cases that do not repeat, e.g., a data collection error; (b) local outliers diverge from their peers within a context, e.g., a particularly short basketball player; and (c) collective outliers are isolated micro-clusters that may indicate coalition or repetitions, e.g., frauds that exploit the same loophole. This paper presents C-AllOut: a novel and effective outlier detector that annotates outliers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Imbalanced Data Classification Techniques