TL;DR
This paper introduces a CDF transform-and-shift method to homogenize cluster densities in datasets, improving the performance of clustering and anomaly detection algorithms that assume uniform densities.
Contribution
The paper proposes a novel multi-dimensional CDF transform-and-shift technique to address inhomogeneous cluster densities without modifying existing algorithms.
Findings
Outperforms existing remedies for density inhomogeneity
Preserves cluster structure while homogenizing densities
Applicable as a preprocessing step for various algorithms
Abstract
The problem of inhomogeneous cluster densities has been a long-standing issue for distance-based and density-based algorithms in clustering and anomaly detection. These algorithms implicitly assume that all clusters have approximately the same density. As a result, they often exhibit a bias towards dense clusters in the presence of sparse clusters. Many remedies have been suggested; yet, we show that they are partial solutions which do not address the issue satisfactorily. To match the implicit assumption, we propose to transform a given dataset such that the transformed clusters have approximately the same density while all regions of locally low density become globally low density -- homogenising cluster density while preserving the cluster structure of the dataset. We show that this can be achieved by using a new multi-dimensional Cumulative Distribution Function in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
