# Fast communication-efficient spectral clustering over distributed data

**Authors:** Donghui Yan, Yingjie Wang, Jin Wang, Guodong Wu, Honggang Wang

arXiv: 1905.01596 · 2019-05-07

## TL;DR

This paper introduces a communication-efficient spectral clustering framework for distributed data that achieves near-accurate results with significant speedup and privacy benefits, especially when data are evenly distributed across sites.

## Contribution

It presents a novel distributed spectral clustering method that minimizes communication, maintains accuracy, and enhances privacy, addressing limitations of existing algorithms assuming centralized data.

## Key findings

- Achieves about 2x speedup in distributed spectral clustering.
- Maintains negligible accuracy loss compared to centralized methods.
- Addresses privacy concerns by transmitting data in non-original form.

## Abstract

The last decades have seen a surge of interests in distributed computing thanks to advances in clustered computing and big data technology. Existing distributed algorithms typically assume {\it all the data are already in one place}, and divide the data and conquer on multiple machines. However, it is increasingly often that the data are located at a number of distributed sites, and one wishes to compute over all the data with low communication overhead. For spectral clustering, we propose a novel framework that enables its computation over such distributed data, with "minimal" communications while a major speedup in computation. The loss in accuracy is negligible compared to the non-distributed setting. Our approach allows local parallel computing at where the data are located, thus turns the distributed nature of the data into a blessing; the speedup is most substantial when the data are evenly distributed across sites. Experiments on synthetic and large UC Irvine datasets show almost no loss in accuracy with our approach while about 2x speedup under various settings with two distributed sites. As the transmitted data need not be in their original form, our framework readily addresses the privacy concern for data sharing in distributed computing.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.01596/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1905.01596/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/1905.01596/full.md

---
Source: https://tomesphere.com/paper/1905.01596