Hashing-Based Distributed Clustering for Massive High-Dimensional Data
Yifeng Xiao, Jiang Xue, Deyu Meng

TL;DR
This paper introduces Hashing-Based Distributed Clustering (HBDC), a novel method for efficient clustering of high-dimensional big data using hashing techniques to reduce data size and improve computational efficiency.
Contribution
The paper proposes a new distributed clustering algorithm that leverages hashing to handle high-dimensional data efficiently, addressing limitations of existing methods.
Findings
HBDC outperforms existing algorithms on synthetic and real datasets.
The method significantly reduces transmission costs in distributed clustering.
HBDC accelerates convergence through a novel sample-selection method.
Abstract
Clustering analysis is of substantial significance for data mining. The properties of big data raise higher demand for more efficient and economical distributed clustering methods. However, existing distributed clustering methods mainly focus on the size of data but ignore possible problems caused by data dimension. To solve this problem, we propose a new distributed algorithm, referred to as Hashing-Based Distributed Clustering (HBDC). Motivated by the outstanding performance of hashing methods for nearest neighbor searching, this algorithm applies the learning-to-hash technique to the clustering problem, which possesses incomparable advantages for data storage, transmission and computation. Following a global-sub-site paradigm, the HBDC consists of distributed training of hashing network and spectral clustering for hash codes at the global site. The sub-sites use the learnable network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Caching and Content Delivery · Video Surveillance and Tracking Methods
