FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments
Anik Pramanik, Murat Kantarcioglu, Vincent Oria, Shantanu Sharma

TL;DR
FedDAG introduces a novel clustered federated learning framework that combines data and gradient similarities with a dual-encoder architecture, enhancing model accuracy in heterogeneous environments.
Contribution
FedDAG proposes a holistic similarity metric and a dual-encoder design to improve clustering and knowledge sharing across clusters in federated learning.
Findings
Outperforms state-of-the-art clustered FL methods in accuracy.
Effectively handles data heterogeneity across clients.
Enables cross-cluster feature transfer without losing cluster-specificity.
Abstract
Federated Learning (FL) enables a group of clients to collaboratively train a model without sharing individual data, but its performance drops when client data are heterogeneous. Clustered FL tackles this by grouping similar clients. However, existing clustered FL approaches rely solely on either data similarity or gradient similarity; however, this results in an incomplete assessment of client similarities. Prior clustered FL approaches also restrict knowledge and representation sharing to clients within the same cluster. This prevents cluster models from benefiting from the diverse client population across clusters. To address these limitations, FedDAG introduces a clustered FL framework, FedDAG, that employs a weighted, class-wise similarity metric that integrates both data and gradient information, providing a more holistic measure of similarity during clustering. In addition,…
Peer Reviews
Decision·ICLR 2026 Poster
1. FedDAG can address label skew, feature skew, concept shift, and quantity shift simultaneously. This comprehensive approach is rare in Federated Learning. 2. FedDAG can adaptively determine the optimal number of clusters. This mechanism effectively solves a long-standing practical issues where most methods require setting the number of clusters in advance. It is practical to handle dynamic scenarios, such as the arrival of new clients (Appendix A.7) and data distribution shifts over time (App
1. FedDAG has large computational overhead due to its complex design, involving an intricate multi-step process (e.g., SVD, clustering, CC-Graph, dual-encoder training, MLP optimization,). Specifically, for computational cost, server-side operations are at least O(N^2) for computing the similarity matrix and clustering, making scalability beyond the 100 clients tested questionable. For communication overhead, the dual-encoder training phase appears to double the communication cost compared to Fe
1. The paper clearly defines different types of non-IID heterogeneity and systematically designs corresponding modules to treat each. 2. The framework integrates clustering, similarity measurement, and inter-cluster knowledge transfer into a cohesive system. 3. FedDAG performs well across datasets and heterogeneity settings, demonstrating robustness in practice. 4. Presentation is clear and easy to follow.
1. The novelty of this work is limited. Most components extend existing approaches with minor tweaks. The framework lacks a central new idea or theoretical insight. 2. There is no convergence, optimality, or complexity analysis; the method’s robustness under different heterogeneity settings is justified only empirically. 3. The ablation studies are incomplete. - The sensitivity of the gradient similarity module to the number of local optimization steps and the sparsification ratio is not exam
- Comprehensive Problem Formulation: The paper does an excellent job of identifying and articulating the key limitations of existing clustered FL methods, such as reliance on a single similarity modality (data or gradients), restricted intra-cluster knowledge sharing, and inability to handle all forms of data skew. The motivation is clear and well-justified. - Novelty and Technical Sophistication: The proposed method is technically sound and introduces several novel ideas. The fusion of data an
- Computational and Architectural Overhead: The dual-encoder architecture inherently doubles the parameter count for the feature extractor compared to a single-model approach. While the ablation study (FedDAG†) convincingly shows that the gains are due to sharing and not just more parameters, this overhead is non-trivial for resource-constrained edge devices. The paper mentions the possibility of alternating training phases to mitigate this, but a more detailed discussion on the trade-offs (e.g.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning
