On the Byzantine-Resilience of Distillation-Based Federated Learning
Christophe Roux, Max Zimmer, Sebastian Pokutta

TL;DR
This paper investigates the robustness of distillation-based federated learning against malicious clients, introduces new attack strategies, and proposes a defense mechanism to improve resilience in adversarial settings.
Contribution
It provides the first comprehensive analysis of Byzantine resilience in KD-based FL, introduces novel attacks, and develops a new defense method to enhance robustness.
Findings
KD-based FL algorithms are surprisingly resilient to Byzantine attacks
New attack methods can effectively break existing defenses
A novel defense mechanism significantly improves Byzantine resilience
Abstract
Federated Learning (FL) algorithms using Knowledge Distillation (KD) have received increasing attention due to their favorable properties with respect to privacy, non-i.i.d. data and communication cost. These methods depart from transmitting model parameters and instead communicate information about a learning task by sharing predictions on a public dataset. In this work, we study the performance of such approaches in the byzantine setting, where a subset of the clients act in an adversarial manner aiming to disrupt the learning process. We show that KD-based FL algorithms are remarkably resilient and analyze how byzantine clients can influence the learning process. Based on these insights, we introduce two new byzantine attacks and demonstrate their ability to break existing byzantine-resilient methods. Additionally, we propose a novel defence method which enhances the byzantine…
Peer Reviews
Decision·ICLR 2025 Poster
The paper is well written and constructed in a fluent structure. Extensive experiments have been conducted to verify the proposed ideas and methods. The proposed method is also somewhat new in the Knowledge distillation domain.
1. The comparison of FedAVG and KD-FL is a little weak and the experimental part is limited to KD-based methods, making the whole idea less convincing. 2. The proposed ExpGuard+F method is similar as the FLTrust methods, which is okay, however, the strength of the proposed method is not exactly reflected from the experiments. For example, is ExpGuard+F better than ExpGuard+GM and if yes, why? 3. The study of HIPS conveys the message that some attacks are hard to detect and combat, but we can us
the paper addresses an important and previously undressed question: the robustness of distributed learning in the distillation setting
the formal guarantees (left to appendix D) deserve a more prominent place in the paper, given that the core contribution of such a paper is to formally derive security guarantees and their limits when (in this case) the space of all possible attacks cannot rely on empirical validation
* The paper is, in general, well written. * The proposed algorithms and the attack strategies are intuitive and easy to understand.
* The most obvious weakness of the paper is that it addresses the IID data distributions. In contrast, the current state-of-the-art (SOTA) Byzantine resilient federated learning methods focus on non-IID data setups. * In Section 4, the motivating example to compare the vanilla FedAvg algorithm and the FedDistill algorithm is rather unfair. This is because the adversaries for the vanilla FedAvg employ model poisoning strategies, while the ones in FedDistill use data poisoning. * In Section 4, th
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Machine Learning and Algorithms
MethodsKnowledge Distillation
