TL;DR
SECODA is a versatile unsupervised anomaly detection algorithm that effectively identifies various anomaly types in datasets with continuous and categorical data, emphasizing efficiency and practical utility.
Contribution
It introduces a novel non-parametric method using discretization, constellations, and heuristics for scalable and accurate anomaly detection in mixed data types.
Findings
Successfully detects diverse anomalies, including complex multidimensional cases.
Demonstrates linear runtime and low memory usage on large datasets.
Proves practical value in real-world data quality scenarios.
Abstract
This study introduces SECODA, a novel general-purpose unsupervised non-parametric anomaly detection algorithm for datasets containing continuous and categorical attributes. The method is guaranteed to identify cases with unique or sparse combinations of attribute values. Continuous attributes are discretized repeatedly in order to correctly determine the frequency of such value combinations. The concept of constellations, exponentially increasing weights and discretization cut points, as well as a pruning heuristic are used to detect anomalies with an optimal number of iterations. Moreover, the algorithm has a low memory imprint and its runtime performance scales linearly with the size of the dataset. An evaluation with simulated and real-life datasets shows that this algorithm is able to identify many different types of anomalies, including complex multidimensional instances. An…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
