Distributed Lance-William Clustering Algorithm

Gavriel Yarmish; Philip Listowsky; Simon Dexter

arXiv:1709.06816·cs.DC·September 21, 2017

Distributed Lance-William Clustering Algorithm

Gavriel Yarmish, Philip Listowsky, Simon Dexter

PDF

Open Access

TL;DR

This paper introduces a parallel, scalable clustering algorithm based on the Lance-William method that efficiently groups objects using a distributed n by n distance matrix, suitable for large datasets.

Contribution

The paper presents a novel parallel and distributed implementation of the Lance-William clustering algorithm for large-scale data clustering tasks.

Findings

01

Algorithm is scalable in processing speed

02

Algorithm efficiently handles large n by n matrices

03

Distributed approach improves storage and computation efficiency

Abstract

One important tool is the optimal clustering of data into useful categories. Dividing similar objects into a smaller number of clusters is of importance in many applications. These include search engines, monitoring of academic performance, biology and wireless networks. We first discuss a number of clustering methods. We present a parallel algorithm for the efficient clustering of objects into groups based on their similarity to each other. The input consists of an n by n distance matrix. This matrix would have a distance ranking for each pair of objects. The smaller the number, the more similar the two objects are to each other. We utilize parallel processors to calculate a hierarchal cluster of these n items based on this matrix. Another advantage of our method is distribution of the large n by n matrix. We have implemented our algorithm and have found it to be scalable both in terms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Algorithms and Data Compression