Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

Xiufang Shi; Wei Zhang; Yuheng Li; Mincheng Wu; Zhenyu Wen; Shibo He; Tejal Shah; Rajiv Ranjan

arXiv:2409.17517·cs.LG·March 25, 2026

Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

Xiufang Shi, Wei Zhang, Yuheng Li, Mincheng Wu, Zhenyu Wen, Shibo He, Tejal Shah, Rajiv Ranjan

PDF

Open Access

TL;DR

This paper introduces HFLDD, a hybrid federated learning framework using dataset distillation to mitigate non-IID data issues, especially label imbalance, resulting in improved accuracy and reduced communication costs.

Contribution

The paper proposes a novel hybrid federated learning framework that employs dataset distillation to handle non-IID data, particularly label skew, by clustering clients and training on distilled data.

Findings

01

HFLDD improves test accuracy on non-IID datasets.

02

HFLDD reduces communication overhead compared to baseline methods.

03

HFLDD effectively handles label imbalance in federated learning.

Abstract

In federated learning, the heterogeneity of client data has a great impact on the performance of model training. Many heterogeneity issues in this process are raised by non-independently and identically distributed (non-IID) data. To address the issue of label distribution skew, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate approximately independent and equally distributed (IID) data, thereby improving the performance of model training. In particular, we partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced while the data labels among different clusters are balanced. The cluster heads collect distilled data from the corresponding cluster members, and conduct model training in collaboration with the server. This training process is like traditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data