Distributed Clustering based on Distributional Kernel
Hang Zhang, Yang Xu, Lei Gong, Ye Zhu, Kai Ming Ting

TL;DR
This paper presents a novel distributed clustering framework using distributional kernels that guarantees equivalence to centralized results, reduces runtime, and handles arbitrary cluster shapes, outperforming existing methods.
Contribution
Introduces the first distributed clustering framework based on distributional kernels, ensuring equivalence to centralized clustering and improved efficiency.
Findings
KDC guarantees equivalent clustering as centralized methods.
KDC reduces maximum runtime cost at distributed sites.
Kernel Bounded Cluster Cores outperforms existing algorithms.
Abstract
This paper introduces a new framework for clustering in a distributed network called Distributed Clustering based on Distributional Kernel (K) or KDC that produces the final clusters based on the similarity with respect to the distributions of initial clusters, as measured by K. It is the only framework that satisfies all three of the following properties. First, KDC guarantees that the combined clustering outcome from all sites is equivalent to the clustering outcome of its centralized counterpart from the combined dataset from all sites. Second, the maximum runtime cost of any site in distributed mode is smaller than the runtime cost in centralized mode. Third, it is designed to discover clusters of arbitrary shapes, sizes and densities. To the best of our knowledge, this is the first distributed clustering framework that employs a distributional kernel. The distribution-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
