Distributed Clustering Algorithm for Spatial Data Mining

Malika Bendechache; M-Tahar Kechadi

arXiv:1802.00304·cs.DB·February 2, 2018

Distributed Clustering Algorithm for Spatial Data Mining

Malika Bendechache, M-Tahar Kechadi

PDF

TL;DR

This paper introduces a novel distributed clustering method for large, heterogeneous spatial datasets that dynamically determines the number of clusters and improves efficiency over existing algorithms.

Contribution

It proposes a new distributed clustering approach based on K-means with dynamic cluster number determination and an efficient aggregation phase.

Findings

01

Produces high-quality clustering results

02

Scales efficiently with large datasets

03

Outperforms two popular clustering algorithms in efficiency

Abstract

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering approaches are normally generating global models by aggregating local results that are obtained on each site. While this approach mines the datasets on their locations the aggregation phase is complex, which may produce incorrect and ambiguous global clusters and therefore incorrect knowledge. In this paper we propose a new clustering approach for very large spatial datasets that are heterogeneous and distributed. The approach is based on K-means Algorithm but it generates the number of global clusters dynamically. Moreover, this approach uses an elaborated aggregation phase. The aggregation phase is designed in such a way that the overall process is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.