TL;DR
Lumbermark is a robust divisive clustering algorithm that effectively detects clusters of various sizes, densities, and shapes by chopping mutual reachability minimum spanning trees, reducing noise influence.
Contribution
It introduces a new clustering method that offers an alternative to HDBSCAN, with a fast implementation and the ability to produce user-specified cluster sizes.
Findings
Performs well on benchmark data
Handles clusters of varying sizes, densities, and shapes
Reduces influence of noise and outliers
Abstract
We introduce Lumbermark, a robust divisive clustering algorithm capable of detecting clusters of varying sizes, densities, and shapes. Lumbermark iteratively chops off large limbs connected by protruding segments of a dataset's mutual reachability minimum spanning tree. The use of mutual reachability distances smoothens the data distribution and decreases the influence of low-density objects, such as noise points between clusters or outliers at their peripheries. The algorithm can be viewed as an alternative to HDBSCAN that produces partitions with user-specified sizes. A fast, easy-to-use implementation of the new method is available in the open-source 'lumbermark' package for Python and R. We show that Lumbermark performs well on benchmark data and hope it will prove useful to data scientists and practitioners across different fields.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
