Differentially Private Federated Clustering with Random Rebalancing

Xiyuan Yang; Shengyuan Hu; Soyeon Kim; Tian Li

arXiv:2508.06183·cs.LG·August 11, 2025

Differentially Private Federated Clustering with Random Rebalancing

Xiyuan Yang, Shengyuan Hu, Soyeon Kim, Tian Li

PDF

Open Access 3 Reviews

TL;DR

This paper introduces RR-Cluster, a simple rebalancing technique for federated clustering that reduces privacy noise and improves utility while maintaining privacy guarantees, validated through theoretical analysis and experiments.

Contribution

The paper proposes RR-Cluster, a lightweight rebalancing method that enhances privacy-utility tradeoffs in federated clustering by controlling cluster assignment sizes.

Findings

01

RR-Cluster reduces privacy noise variance.

02

Improves clustering utility under differential privacy.

03

Demonstrates effectiveness on synthetic and real datasets.

Abstract

Federated clustering aims to group similar clients into clusters and produce one model for each cluster. Such a personalization approach typically improves model performance compared with training a single model to serve all clients, but can be more vulnerable to privacy leakage. Directly applying client-level differentially private (DP) mechanisms to federated clustering could degrade the utilities significantly. We identify that such deficiencies are mainly due to the difficulties of averaging privacy noise within each cluster (following standard privacy mechanisms), as the number of clients assigned to the same clusters is uncontrolled. To this end, we propose a simple and effective technique, named RR-Cluster, that can be viewed as a light-weight add-on to many federated clustering algorithms. RR-Cluster achieves reduced privacy noise via randomly rebalancing cluster assignments,…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

1. The proposed work addresses improving the performance of existing clustered FL algorithms when they are enhanced with DP guarantees, which is an important problem. 2. An extensive set of experimental results are reported (however they need to be improved, see below)

Weaknesses

While I have understood the point of the proposed idea completely, I strongly feel that the current experimental results do not evaluate it properly to validate the correctness of the claims in the paper. I list the existing weaknesses followed by my detailed questions in the next section for clarification. 1. The privacy setting considers a trusted server, which may not always be available in FL settings. 2. The experimental results are reported in an optimistic way that does not fully evalua

Reviewer 02Rating 6Confidence 3

Strengths

- The random rebalancing approach can be integrated with various clustered FL algorithms - The derivations seem correct and consistent with standard DP theory. - Consistently outperforms baselines across datasets and privacy budgets.

Weaknesses

- The method assumes that rebalancing doesn't significantly hurt utility, but under concept shift (could be adversarial or not), incorrect assignments could accumulate bias. - Experiments focus on classification tasks with synthetic and benchmark datasets (despite claim in abstract of the use of real world datasets) - Only average results reported.

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper is clearly written and easy to follow. 2. The proposed RR-Cluster method is promising, as increasing the number of clients contributing to small clusters effectively mitigates the noise intensity introduced by differential privacy. 3. The authors conduct both theoretical (privacy and convergence) and empirical evaluations of RR-Cluster. The derived theoretical bounds also capture the bias–variance trade-off introduced by the proposed mechanism.

Weaknesses

1. The paper lacks a detailed description of the defense and attack models, which is essential for readers to fully understand the assumptions and setup of the considered DP-FL system. 2. The proposed RR-Cluster method appears to rely on the assumption that the server is fully honest. However, its effectiveness may significantly degrade—or even vanish—under an honest-but-curious server model, limiting its practical applicability. 3. The final convergence bound presented in Corollary 1 seems ov

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Cryptography and Data Security