Distributed Algorithms for Euclidean Clustering
Vincent Cohen-Addad, Liudeng Wang, David P. Woodruff, Samson Zhou

TL;DR
This paper develops improved distributed algorithms for constructing small, accurate coresets for Euclidean clustering problems, reducing communication costs and achieving near-optimal bounds in different models.
Contribution
It introduces new protocols for distributed Euclidean clustering that significantly lower communication complexity while providing $(1+ ext{epsilon})$-approximate coresets, improving upon prior methods.
Findings
Achieved $(1+ ext{epsilon})$-coresets with reduced communication in coordinator model.
Extended techniques to the blackboard model with further communication reduction.
Matched lower bounds up to polylogarithmic factors for distributed coreset construction.
Abstract
We study the problem of constructing -coresets for Euclidean -clustering in the distributed setting, where data points are partitioned across sites. We focus on two prominent communication models: the coordinator model and the blackboard model. In the coordinator model, we design a protocol that achieves a -strong coreset with total communication complexity bits, improving upon prior work (Chen et al., NeurIPS 2016) by eliminating the need to communicate explicit point coordinates in-the-clear across all servers. In the blackboard model, we further reduce the communication complexity to bits, achieving better bounds than previous…
Peer Reviews
Decision·ICLR 2026 Poster
1) The problem is an interesting one and will be of interest to the community. 2) To the best I could check, the claims in the paper appear sound. The paper is overall written well. 3) The paper improves on existing communication complexity bounds for both the coordinator and blackboard models. 4) The way, the usual tools and tricks of coreset literature like JL transform, bicriteria approximation, sensitivity sampling etc, have been modified to fit the requirements of the distributed setting w
The two minor weaknesses that I see in the paper are: 1) The structure of the paper makes it a little hard to parse. Also, many ideas like JL transform, bicriteria approximation, sensitivity sampling which are well known are used in a clever way to get the results. However, that technical novelty and challenges are not sufficiently evident from the main body of the paper. The authors should try to highlight them. It may be a good idea to bring the discussion on why the usual techniques are not d
1.The paper improves the communication complexity for distributed (k,z)-clustering coresets in both the coordinator and blackboard models as illustrated in Figure 1, eliminating the need to send raw coordinates. 2.The paper presents a solid theoretical contribution to the study of distributed coreset construction for Euclidean (k,z)-clustering. 3.The paper is well-written and well-structured, making the technical content accessible. 4.The combination of new coreset constructions, compact enc
1.While the paper proposes distributed algorithms for both the coordinator and blackboard models, the experimental evaluation only includes results for the blackboard model. No empirical comparison or validation is provided for the coordinator-based algorithm. 2.There is no evaluation of how the algorithm scales with the number of distributed machines, which is critical for understanding its practical applicability in real-world distributed systems. 3.The paper provides only communication com
1. The results are interesting and elegant — unlike prior approaches that required a communication term of $sdk \cdot \log(n\Delta)$ (due to each site transmitting point coordinates), the proposed approach eliminates this per-site dependence. 2. The techniques appear to involve clever ideas and nontrivial technical depth.
1. The main concern is presentation quality. The paper does not clearly explain the technical innovations. Moreover, the absence of proofs prevents the reader from understanding or validating the key arguments. This issue is aggravated by theorem statements (e.g., Lemmas 2.1 and 2.2) referring to algorithms that are not actually described in the main text. While the results seem promising, the lack of detail makes it impossible to assess the technical contributions on their merit. 2. Related to
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Advanced Clustering Algorithms Research
