Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning
Yichen Li, Xiuying Wang, Wenchao Xu, Haozhao Wang, Yining Qi, Jiahua Dong, Ruixuan Li

TL;DR
This paper introduces FedFD, a feature distillation approach for model-heterogeneous federated learning, which aligns features across diverse models using orthogonal projections to improve knowledge aggregation and model performance.
Contribution
The paper proposes a novel feature distillation method with orthogonal projections to better handle heterogeneity in federated learning models, enhancing stability and accuracy.
Findings
FedFD outperforms existing methods in experiments.
Orthogonal projection mitigates knowledge bias from heterogeneous models.
Feature-based distillation improves global model performance.
Abstract
Model-Heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally. To better aggregate knowledge from clients, ensemble distillation, as a widely used and effective technique, is often employed after global aggregation to enhance the performance of the global model. However, simply combining Hetero-FL and ensemble distillation does not always yield promising results and can make the training process unstable. The reason is that existing methods primarily focus on logit distillation, which, while being model-agnostic with softmax predictions, fails to compensate for the knowledge bias arising from heterogeneous models. To tackle this challenge, we propose a stable and efficient Feature Distillation for model-heterogeneous Federated learning, dubbed FedFD, that can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Graph Neural Networks
MethodsKnowledge Distillation · ALIGN · Softmax · Focus
