FedEMA-Distill: Exponential Moving Average Guided Knowledge Distillation for Robust Federated Learning

Hamza Reguieg; Mohamed El Kamili; Essaid Sabir

arXiv:2603.04422·cs.LG·March 6, 2026

FedEMA-Distill: Exponential Moving Average Guided Knowledge Distillation for Robust Federated Learning

Hamza Reguieg, Mohamed El Kamili, Essaid Sabir

PDF

Open Access

TL;DR

FedEMA-Distill introduces a server-side method combining exponential moving averages and ensemble knowledge distillation to enhance robustness, efficiency, and heterogeneity support in federated learning, especially under non-IID data and adversarial conditions.

Contribution

The paper presents a novel federated learning approach that uses EMA and logits-based knowledge distillation, supporting heterogeneous models and improving robustness and communication efficiency.

Findings

01

Improves accuracy by up to 6% on benchmark datasets.

02

Reduces communication rounds by 30-35%.

03

Enhances robustness against Byzantine clients.

Abstract

Federated learning (FL) often degrades when clients hold heterogeneous non-Independent and Identically Distributed (non-IID) data and when some clients behave adversarially, leading to client drift, slow convergence, and high communication overhead. This paper proposes FedEMA-Distill, a server-side procedure that combines an exponential moving average (EMA) of the global model with ensemble knowledge distillation from client-uploaded prediction logits evaluated on a small public proxy dataset. Clients run standard local training, upload only compressed logits, and may use different model architectures, so no changes are required to client-side software while still supporting model heterogeneity across devices. Experiments on CIFAR-10, CIFAR-100, FEMNIST, and AG News under Dirichlet-0.1 label skew show that FedEMA-Distill improves top-1 accuracy by several percentage points (up to +5% on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Domain Adaptation and Few-Shot Learning