Distributed Spatial Data Clustering as a New Approach for Big Data Analysis
Malika Bendechache, Nhien-An Le-Khac, M-Tahar Kechadi

TL;DR
This paper introduces a distributed clustering approach for big spatial datasets that efficiently generates global clusters without prior knowledge of cluster count, leveraging local clustering and minimal data exchange.
Contribution
It presents a novel two-phase distributed clustering method that scales well and works efficiently on spatial big data using models like MapReduce.
Findings
Achieves super linear speedup in experiments
Scales effectively with increasing data size
Operates efficiently with minimal inter-node communication
Abstract
In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase executes a clustering algorithm on local data, assuming that the datasets was already distributed among the system processing nodes. The second phase deals with the local clusters aggregation to generate global clusters. This approach not only generates local clusters on each processing node in parallel, but also facilitates the formation of global clusters without prior knowledge of the number of the clusters, which many partitioning clustering algorithm require. In this study, this approach was applied on spatial datasets. The proposed aggregation phase is very efficient and does not involve the exchange of large amounts of data between the processing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Advanced Clustering Algorithms Research · Data Management and Algorithms
