CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings

Randeep Bhatia; Nikos Papadis; Murali Kodialam; TV Lakshman; Sayak Chakrabarty

arXiv:2506.22427·cs.LG·June 30, 2025

CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings

Randeep Bhatia, Nikos Papadis, Murali Kodialam, TV Lakshman, Sayak Chakrabarty

PDF

Open Access 3 Reviews

TL;DR

CLoVE introduces a simple, robust clustering algorithm for federated learning that effectively identifies client groups based on loss embeddings, enabling fast convergence and high accuracy in diverse settings.

Contribution

The paper presents CLoVE, a novel clustering method for federated learning that does not require near-optimal initialization and works in both supervised and unsupervised scenarios.

Findings

01

Achieves accurate cluster recovery in few rounds

02

Converges exponentially fast to optimal models

03

Outperforms existing CFL and PFL algorithms in accuracy

Abstract

We propose CLoVE (Clustering of Loss Vector Embeddings), a novel algorithm for Clustered Federated Learning (CFL). In CFL, clients are naturally grouped into clusters based on their data distribution. However, identifying these clusters is challenging, as client assignments are unknown. CLoVE utilizes client embeddings derived from model losses on client data, and leverages the insight that clients in the same cluster share similar loss values, while those in different clusters exhibit distinct loss patterns. Based on these embeddings, CLoVE is able to iteratively identify and separate clients from different clusters and optimize cluster-specific models through federated aggregation. Key advantages of CLoVE over existing CFL algorithms are (1) its simplicity, (2) its applicability to both supervised and unsupervised settings, and (3) the fact that it eliminates the need for near-optimal…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

Using loss-vector embeddings sidesteps careful warm-starts and delivers quick, stable clustering in practice.

Weaknesses

1. guarantees are shown only for linear models; applicability to nonconvex deep networks remains unproven. 2. clients must evaluate multiple models each round to form loss vectors, which increases local compute/communication and may leak information about client data through loss profiles. 3. dynamic clustering with sparse participation, label noise, or malicious clients may oscillate or be exploitable, and robustness is not theoretically characterized.

Reviewer 02Rating 4Confidence 3

Strengths

1. Proposes an approach using loss vector embeddings for client clustering, eliminating need for careful initialization. 2. Provides rigorous convergence analysis for mixed linear regression, to Theoretical guarantees for cluster recovery 3. Works in both supervised and unsupervised settings, with Simplicity and wide applicability

Weaknesses

1. Analysis restricted to convex setting (linear regression), lacking guarantees for non-convex settings commonly used in practice. Missing comparisons with recent (2024-2025) state-of-the-art methods addressing similar challenges . 2. For the key challenge of sparse client participation in FL, the paper lacks corresponding theoretical analysis and systematic experimental validation. For example, will the different setting of initicial K cluster number result in different results. 3. Insuffici

Reviewer 03Rating 6Confidence 3

Strengths

* Although similar to the prior method Iterative Federated Clustering Algorithm (IFCA), which uses the loss to build cluster identities, CLoVE simultaneously estimate the underlying clusters and constructs models per cluster. It does not require prescribe number of clusters and not sensitive to the model initialization, which consequently decides the cluster initialization. * The manuscript is clearly written, methodologically solid, and well positioned within the current literature.

Weaknesses

* The experiments choose some relative simple datasets to validate its effectiveness. For example, for minist, cifar-10, and FMNIST, several baseline models already achieve test 100\% accuracy. It's unclear how the methods perform on relative challenging datasets like Tiny-ImageNet. * My concerns are on the communication costs and storage costs. Before the clustering stabilized, in each communication round, the server needs to broadcast $K^{(t)}$ models and the clients need to store them. Compa

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Machine Learning in Healthcare