Cluster Purging: Efficient Outlier Detection based on Rate-Distortion Theory
Maximilian B. Toller, Bernhard C. Geiger, Roman Kern

TL;DR
This paper introduces Cluster Purging, a novel outlier detection method based on rate-distortion theory that enhances clustering-based outlier detection by assessing cluster representivity and identifying unique outliers.
Contribution
The paper proposes a new extension called Cluster Purging, with two efficient algorithms, one parameter-free and one tunable, improving outlier detection over existing methods.
Findings
Cluster Purging outperforms raw clustering-based outlier detection.
It competes strongly against state-of-the-art outlier detection methods.
The parameter-tunable algorithm adapts well to supervised setups.
Abstract
Rate-distortion theory-based outlier detection builds upon the rationale that a good data compression will encode outliers with unique symbols. Based on this rationale, we propose Cluster Purging, which is an extension of clustering-based outlier detection. This extension allows one to assess the representivity of clusterings, and to find data that are best represented by individual unique clusters. We propose two efficient algorithms for performing Cluster Purging, one being parameter-free, while the other algorithm has a parameter that controls representivity estimations, allowing it to be tuned in supervised setups. In an experimental evaluation, we show that Cluster Purging improves upon outliers detected from raw clusterings, and that Cluster Purging competes strongly against state-of-the-art alternatives.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
