Communication-efficient k-Means for Edge-based Machine Learning
Hanlin Lu, Ting He, Shiqiang Wang, Changchang Liu, Mehrdad Mahdavi,, Vijaykrishnan Narayanan, Kevin S. Chan, Stephen Pasteris

TL;DR
This paper introduces a communication-efficient method for computing k-means centers in edge-based machine learning by using data summaries that combine dimensionality reduction, cardinality reduction, and quantization, achieving near-optimal accuracy with low complexity and communication.
Contribution
It proposes a novel combination of DR, CR, and QT techniques for approximate k-means computation that reduces communication and computation costs in edge environments.
Findings
Near-linear complexity for k-means approximation
Constant or logarithmic communication cost
Effective combination of DR/CR/QT without accuracy loss
Abstract
We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. k-Means computation is fundamental to many data analytics, and the capability of computing provably accurate k-means centers by leveraging the computation power of the edge servers, at a low communication and computation cost to the data sources, will greatly improve the performance of these analytics. We propose to let the data sources send small summaries, generated by joint dimensionality reduction (DR), cardinality reduction (CR), and quantization (QT), to support approximate k-means computation at reduced complexity and communication cost. By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on carefully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Error Correcting Code Techniques
