Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

Yichen Li; Xiuying Wang; Wenchao Xu; Haozhao Wang; Yining Qi; Jiahua Dong; Ruixuan Li

arXiv:2507.10348·cs.LG·October 15, 2025

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

Yichen Li, Xiuying Wang, Wenchao Xu, Haozhao Wang, Yining Qi, Jiahua Dong, Ruixuan Li

PDF

Open Access

TL;DR

This paper introduces FedFD, a feature distillation approach for model-heterogeneous federated learning, which aligns features across diverse models using orthogonal projections to improve knowledge aggregation and model performance.

Contribution

The paper proposes a novel feature distillation method with orthogonal projections to better handle heterogeneity in federated learning models, enhancing stability and accuracy.

Findings

01

FedFD outperforms existing methods in experiments.

02

Orthogonal projection mitigates knowledge bias from heterogeneous models.

03

Feature-based distillation improves global model performance.

Abstract

Model-Heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally. To better aggregate knowledge from clients, ensemble distillation, as a widely used and effective technique, is often employed after global aggregation to enhance the performance of the global model. However, simply combining Hetero-FL and ensemble distillation does not always yield promising results and can make the training process unstable. The reason is that existing methods primarily focus on logit distillation, which, while being model-agnostic with softmax predictions, fails to compensate for the knowledge bias arising from heterogeneous models. To tackle this challenge, we propose a stable and efficient Feature Distillation for model-heterogeneous Federated learning, dubbed FedFD, that can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Advanced Graph Neural Networks

MethodsKnowledge Distillation · ALIGN · Softmax · Focus