On the Byzantine-Resilience of Distillation-Based Federated Learning

Christophe Roux; Max Zimmer; Sebastian Pokutta

arXiv:2402.12265·cs.LG·March 18, 2025·1 cites

On the Byzantine-Resilience of Distillation-Based Federated Learning

Christophe Roux, Max Zimmer, Sebastian Pokutta

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper investigates the robustness of distillation-based federated learning against malicious clients, introduces new attack strategies, and proposes a defense mechanism to improve resilience in adversarial settings.

Contribution

It provides the first comprehensive analysis of Byzantine resilience in KD-based FL, introduces novel attacks, and develops a new defense method to enhance robustness.

Findings

01

KD-based FL algorithms are surprisingly resilient to Byzantine attacks

02

New attack methods can effectively break existing defenses

03

A novel defense mechanism significantly improves Byzantine resilience

Abstract

Federated Learning (FL) algorithms using Knowledge Distillation (KD) have received increasing attention due to their favorable properties with respect to privacy, non-i.i.d. data and communication cost. These methods depart from transmitting model parameters and instead communicate information about a learning task by sharing predictions on a public dataset. In this work, we study the performance of such approaches in the byzantine setting, where a subset of the clients act in an adversarial manner aiming to disrupt the learning process. We show that KD-based FL algorithms are remarkably resilient and analyze how byzantine clients can influence the learning process. Based on these insights, we introduce two new byzantine attacks and demonstrate their ability to break existing byzantine-resilient methods. Additionally, we propose a novel defence method which enhances the byzantine…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 5

Strengths

The paper is well written and constructed in a fluent structure. Extensive experiments have been conducted to verify the proposed ideas and methods. The proposed method is also somewhat new in the Knowledge distillation domain.

Weaknesses

1. The comparison of FedAVG and KD-FL is a little weak and the experimental part is limited to KD-based methods, making the whole idea less convincing. 2. The proposed ExpGuard+F method is similar as the FLTrust methods, which is okay, however, the strength of the proposed method is not exactly reflected from the experiments. For example, is ExpGuard+F better than ExpGuard+GM and if yes, why? 3. The study of HIPS conveys the message that some attacks are hard to detect and combat, but we can us

Reviewer 02Rating 6Confidence 4

Strengths

the paper addresses an important and previously undressed question: the robustness of distributed learning in the distillation setting

Weaknesses

the formal guarantees (left to appendix D) deserve a more prominent place in the paper, given that the core contribution of such a paper is to formally derive security guarantees and their limits when (in this case) the space of all possible attacks cannot rely on empirical validation

Reviewer 03Rating 6Confidence 4

Strengths

* The paper is, in general, well written. * The proposed algorithms and the attack strategies are intuitive and easy to understand.

Weaknesses

* The most obvious weakness of the paper is that it addresses the IID data distributions. In contrast, the current state-of-the-art (SOTA) Byzantine resilient federated learning methods focus on non-IID data setups. * In Section 4, the motivating example to compare the vanilla FedAvg algorithm and the FedDistill algorithm is rather unfair. This is because the adversaries for the vanilla FedAvg employ model poisoning strategies, while the ones in FedDistill use data poisoning. * In Section 4, th

Code & Models

Repositories

zib-iol/feddistill
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Machine Learning and Algorithms

MethodsKnowledge Distillation