A New Parallel Adaptive Clustering and its Application to Streaming Data
Benjamin McLaughlin, Sung Ha Kang

TL;DR
This paper introduces a parallel adaptive clustering algorithm that automatically determines the number of clusters and efficiently handles large and streaming datasets through parallel processing and adaptive refinement.
Contribution
The paper proposes a novel parallel adaptive clustering algorithm that automatically selects the number of clusters and improves efficiency for large and streaming data.
Findings
The PAC algorithm effectively determines the number of clusters.
It demonstrates high computational efficiency in experiments.
The method adapts to changing data over time.
Abstract
This paper presents a parallel adaptive clustering (PAC) algorithm to automatically classify data while simultaneously choosing a suitable number of classes. Clustering is an important tool for data analysis and understanding in a broad set of areas including data reduction, pattern analysis, and classification. However, the requirement to specify the number of clusters in advance and the computational burden associated with clustering large sets of data persist as challenges in clustering. We propose a new parallel adaptive clustering (PAC) algorithm that addresses these challenges by adaptively computing the number of clusters and leveraging the power of parallel computing. The algorithm clusters disjoint subsets of the data on parallel computation threads. We develop regularized set \mi{k}-means to efficiently cluster the results from the parallel threads. A refinement step further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Bayesian Methods and Mixture Models
