Unsupervised Parameter-free Outlier Detection using HDBSCAN* Outlier Profiles
Kushankur Ghosh, Murilo Coelho Naldi, J\"org Sander, Euijin Choo

TL;DR
This paper introduces an unsupervised, parameter-free approach for outlier detection using GLOSH within HDBSCAN*, automatically selecting optimal parameters and thresholds to improve outlier identification without prior knowledge.
Contribution
The authors propose novel unsupervised strategies to automatically determine the best minpts parameter and outlier threshold for GLOSH, enhancing its robustness and usability.
Findings
Strategies effectively identify optimal minpts values.
Automatic threshold estimation improves outlier detection accuracy.
Methods outperform fixed-parameter approaches.
Abstract
In machine learning and data mining, outliers are data points that significantly differ from the dataset and often introduce irrelevant information that can induce bias in its statistics and models. Therefore, unsupervised methods are crucial to detect outliers if there is limited or no information about them. Global-Local Outlier Scores based on Hierarchies (GLOSH) is an unsupervised outlier detection method within HDBSCAN*, a state-of-the-art hierarchical clustering method. GLOSH estimates outlier scores for each data point by comparing its density to the highest density of the region they reside in the HDBSCAN* hierarchy. GLOSH may be sensitive to HDBSCAN*'s minpts parameter that influences density estimation. With limited knowledge about the data, choosing an appropriate minpts value beforehand is challenging as one or some minpts values may better represent the underlying cluster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Fault Detection and Control Systems · Machine Fault Diagnosis Techniques
