Parallel D2-Clustering: Large-Scale Clustering of Discrete Distributions
Yu Zhang, James Z. Wang, Jia Li

TL;DR
This paper introduces a parallel version of the D2-clustering algorithm that significantly improves scalability and speed for large-scale clustering tasks involving discrete distributions, with minimal loss in accuracy.
Contribution
The paper proposes a hierarchical parallel D2-clustering algorithm that enhances scalability and efficiency for large datasets, applicable to various data types.
Findings
Significant speed-up achieved even with a single CPU.
Effective clustering on large-scale image, video, and protein data.
Minor accuracy loss compared to the original sequential algorithm.
Abstract
The discrete distribution clustering algorithm, namely D2-clustering, has demonstrated its usefulness in image classification and annotation where each object is represented by a bag of weighed vectors. The high computational complexity of the algorithm, however, limits its applications to large-scale problems. We present a parallel D2-clustering algorithm with substantially improved scalability. A hierarchical structure for parallel computing is devised to achieve a balance between the individual-node computation and the integration process of the algorithm. Additionally, it is shown that even with a single CPU, the hierarchical structure results in significant speed-up. Experiments on real-world large-scale image data, Youtube video data, and protein sequence data demonstrate the efficiency and wide applicability of the parallel D2-clustering algorithm. The loss in clustering accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Image Retrieval and Classification Techniques · Advanced Data Compression Techniques
