CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with Clustered Aggregation and Knowledge DIStilled Regularization
Nang Hung Nguyen, Duc Long Nguyen, Trong Bang Nguyen, Thanh-Hung, Nguyen, Huy Hieu Pham, Truong Thao Nguyen, Phi Le Nguyen

TL;DR
This paper introduces CADIS, a federated learning method designed to handle cluster-skewed non-IID data by using clustered aggregation and knowledge distillation, significantly improving accuracy over traditional methods.
Contribution
The paper proposes a novel aggregation scheme and local regularization technique tailored for cluster-skewed non-IID data in federated learning, with theoretical and empirical validation.
Findings
Improves accuracy by up to 16% over FedAvg.
Effectively handles cluster-skewed non-IID data.
Theoretically proven superiority of the aggregation scheme.
Abstract
Federated learning enables edge devices to train a global model collaboratively without exposing their data. Despite achieving outstanding advantages in computing efficiency and privacy protection, federated learning faces a significant challenge when dealing with non-IID data, i.e., data generated by clients that are typically not independent and identically distributed. In this paper, we tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets. The cluster-skewed non-IID is a phenomenon in which clients can be grouped into clusters with similar data distributions. By performing an in-depth analysis of the behavior of a classification model's penultimate layer, we introduce a metric that quantifies the similarity between two clients' data distributions without violating their privacy. We then propose an aggregation scheme that guarantees equality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Traffic Prediction and Management Techniques · Human Mobility and Location-Based Analysis
