Selective Clustering Annotated using Modes of Projections
Evan Greene, Greg Finak, Raphael Gottardo

TL;DR
SCAMP is a novel non-parametric clustering algorithm that identifies clusters based on shape constraints along projections without pre-specifying the number of clusters, and includes annotation and uncertainty assessment features.
Contribution
SCAMP introduces a shape-constrained search approach for clustering that does not require pre-setting the number of clusters and provides annotation and uncertainty measures.
Findings
Effective in identifying clusters without pre-defining their number.
Provides meaningful annotations describing cluster characteristics.
Available implementation in C++ with R interface.
Abstract
Selective clustering annotated using modes of projections (SCAMP) is a new clustering algorithm for data in . SCAMP is motivated from the point of view of non-parametric mixture modeling. Rather than maximizing a classification likelihood to determine cluster assignments, SCAMP casts clustering as a search and selection problem. One consequence of this problem formulation is that the number of clusters is a SCAMP tuning parameter. The search phase of SCAMP consists of finding sub-collections of the data matrix, called candidate clusters, that obey shape constraints along each coordinate projection. An extension of the dip test of Hartigan and Hartigan (1985) is developed to assist the search. Selection occurs by scoring each candidate cluster with a preference function that quantifies prior belief about the mixture composition. Clustering proceeds by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Data Management and Algorithms
