SECODA: Segmentation- and Combination-Based Detection of Anomalies

Ralph Foorthuis

arXiv:2008.06869·cs.DB·August 18, 2020

SECODA: Segmentation- and Combination-Based Detection of Anomalies

Ralph Foorthuis

PDF

1 Repo

TL;DR

SECODA is a versatile unsupervised anomaly detection algorithm that effectively identifies various anomaly types in datasets with continuous and categorical data, emphasizing efficiency and practical utility.

Contribution

It introduces a novel non-parametric method using discretization, constellations, and heuristics for scalable and accurate anomaly detection in mixed data types.

Findings

01

Successfully detects diverse anomalies, including complex multidimensional cases.

02

Demonstrates linear runtime and low memory usage on large datasets.

03

Proves practical value in real-world data quality scenarios.

Abstract

This study introduces SECODA, a novel general-purpose unsupervised non-parametric anomaly detection algorithm for datasets containing continuous and categorical attributes. The method is guaranteed to identify cases with unique or sparse combinations of attribute values. Continuous attributes are discretized repeatedly in order to correctly determine the frequency of such value combinations. The concept of constellations, exponentially increasing weights and discretization cut points, as well as a pruning heuristic are used to detect anomalies with an optimal number of iterations. Moreover, the algorithm has a low memory imprint and its runtime performance scales linearly with the size of the dataset. An evaluation with simulated and real-life datasets shows that this algorithm is able to identify many different types of anomalies, including complex multidimensional instances. An…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ralfoan/SECODA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning