Distributed Spatial Data Clustering as a New Approach for Big Data   Analysis

Malika Bendechache; Nhien-An Le-Khac; M-Tahar Kechadi

arXiv:1710.09593·cs.DC·March 5, 2018

Distributed Spatial Data Clustering as a New Approach for Big Data Analysis

Malika Bendechache, Nhien-An Le-Khac, M-Tahar Kechadi

PDF

Open Access

TL;DR

This paper introduces a distributed clustering approach for big spatial datasets that efficiently generates global clusters without prior knowledge of cluster count, leveraging local clustering and minimal data exchange.

Contribution

It presents a novel two-phase distributed clustering method that scales well and works efficiently on spatial big data using models like MapReduce.

Findings

01

Achieves super linear speedup in experiments

02

Scales effectively with increasing data size

03

Operates efficiently with minimal inter-node communication

Abstract

In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase executes a clustering algorithm on local data, assuming that the datasets was already distributed among the system processing nodes. The second phase deals with the local clusters aggregation to generate global clusters. This approach not only generates local clusters on each processing node in parallel, but also facilitates the formation of global clusters without prior knowledge of the number of the clusters, which many partitioning clustering algorithm require. In this study, this approach was applied on spatial datasets. The proposed aggregation phase is very efficient and does not involve the exchange of large amounts of data between the processing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Advanced Clustering Algorithms Research · Data Management and Algorithms