Re-embedding data to strengthen recovery guarantees of clustering
Tao Jiang, Samuel Tan, Stephen Vavasis

TL;DR
This paper introduces a novel clustering pipeline that combines leapfrog distances, multidimensional scaling, spectral methods, and sum-of-norms clustering to enhance recovery guarantees in clustering tasks.
Contribution
The paper presents a new pipeline that re-embeds data points to improve the recovery guarantees of SON clustering, combining multiple known techniques in a novel way.
Findings
Re-embedding improves the recovery guarantees of SON clustering.
The pipeline reduces the embedding dimension significantly.
Provable guarantees are established for the improved clustering method.
Abstract
We propose a clustering method that involves chaining four known techniques into a pipeline yielding an algorithm with stronger recovery guarantees than any of the four components separately. Given points in , the first component of our pipeline, which we call leapfrog distances, is reminiscent of density-based clustering, yielding an distance matrix. The leapfrog distances are then translated to new embeddings using multidimensional scaling and spectral methods, two other known techniques, yielding new embeddings of the points in , where satisfies in general. Finally, sum-of-norms (SON) clustering is applied to the re-embedded points. Although the fourth step (SON clustering) can in principle be replaced by any other clustering method, our focus is on provable guarantees of recovery of underlying structure. Therefore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Topological and Geometric Data Analysis · Advanced Clustering Algorithms Research
