CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings
Randeep Bhatia, Nikos Papadis, Murali Kodialam, TV Lakshman, Sayak Chakrabarty

TL;DR
CLoVE introduces a simple, robust clustering algorithm for federated learning that effectively identifies client groups based on loss embeddings, enabling fast convergence and high accuracy in diverse settings.
Contribution
The paper presents CLoVE, a novel clustering method for federated learning that does not require near-optimal initialization and works in both supervised and unsupervised scenarios.
Findings
Achieves accurate cluster recovery in few rounds
Converges exponentially fast to optimal models
Outperforms existing CFL and PFL algorithms in accuracy
Abstract
We propose CLoVE (Clustering of Loss Vector Embeddings), a novel algorithm for Clustered Federated Learning (CFL). In CFL, clients are naturally grouped into clusters based on their data distribution. However, identifying these clusters is challenging, as client assignments are unknown. CLoVE utilizes client embeddings derived from model losses on client data, and leverages the insight that clients in the same cluster share similar loss values, while those in different clusters exhibit distinct loss patterns. Based on these embeddings, CLoVE is able to iteratively identify and separate clients from different clusters and optimize cluster-specific models through federated aggregation. Key advantages of CLoVE over existing CFL algorithms are (1) its simplicity, (2) its applicability to both supervised and unsupervised settings, and (3) the fact that it eliminates the need for near-optimal…
Peer Reviews
Decision·Submitted to ICLR 2026
Using loss-vector embeddings sidesteps careful warm-starts and delivers quick, stable clustering in practice.
1. guarantees are shown only for linear models; applicability to nonconvex deep networks remains unproven. 2. clients must evaluate multiple models each round to form loss vectors, which increases local compute/communication and may leak information about client data through loss profiles. 3. dynamic clustering with sparse participation, label noise, or malicious clients may oscillate or be exploitable, and robustness is not theoretically characterized.
1. Proposes an approach using loss vector embeddings for client clustering, eliminating need for careful initialization. 2. Provides rigorous convergence analysis for mixed linear regression, to Theoretical guarantees for cluster recovery 3. Works in both supervised and unsupervised settings, with Simplicity and wide applicability
1. Analysis restricted to convex setting (linear regression), lacking guarantees for non-convex settings commonly used in practice. Missing comparisons with recent (2024-2025) state-of-the-art methods addressing similar challenges . 2. For the key challenge of sparse client participation in FL, the paper lacks corresponding theoretical analysis and systematic experimental validation. For example, will the different setting of initicial K cluster number result in different results. 3. Insuffici
* Although similar to the prior method Iterative Federated Clustering Algorithm (IFCA), which uses the loss to build cluster identities, CLoVE simultaneously estimate the underlying clusters and constructs models per cluster. It does not require prescribe number of clusters and not sensitive to the model initialization, which consequently decides the cluster initialization. * The manuscript is clearly written, methodologically solid, and well positioned within the current literature.
* The experiments choose some relative simple datasets to validate its effectiveness. For example, for minist, cifar-10, and FMNIST, several baseline models already achieve test 100\% accuracy. It's unclear how the methods perform on relative challenging datasets like Tiny-ImageNet. * My concerns are on the communication costs and storage costs. Before the clustering stabilized, in each communication round, the server needs to broadcast $K^{(t)}$ models and the clients need to store them. Compa
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Machine Learning in Healthcare
