A Distributed and Approximated Nearest Neighbors Algorithm for an Efficient Large Scale Mean Shift Clustering
Ga\"el Beck, Tarn Duong, Mustapha Lebbah, Hanane Azzag, Christophe, C\'erin

TL;DR
This paper introduces a scalable, distributed approximation of the Mean Shift clustering algorithm using Locality Sensitive Hashing, enabling efficient large-scale modal clustering with improved computational performance.
Contribution
The paper presents a novel distributed and approximate Mean Shift clustering method with linear time complexity, leveraging LSH for scalable density gradient estimation and cluster labeling.
Findings
Achieves linear time complexity for large datasets
Demonstrates improved clustering accuracy with approximations
Provides a distributed implementation in Spark/Scala
Abstract
In this paper we target the class of modal clustering methods where clusters are defined in terms of the local modes of the probability density function which generates the data. The most well-known modal clustering method is the k-means clustering. Mean Shift clustering is a generalization of the k-means clustering which computes arbitrarily shaped clusters as defined as the basins of attraction to the local modes created by the density gradient ascent paths. Despite its potential, the Mean Shift approach is a computationally expensive method for unsupervised learning. Thus, we introduce two contributions aiming to provide clustering algorithms with a linear time complexity, as opposed to the quadratic time complexity for the exact Mean Shift clustering. Firstly we propose a scalable procedure to approximate the density gradient ascent. Second, our proposed scalable cluster labeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Face and Expression Recognition
