Federated Knowledge Distillation
Hyowoon Seo, Jihong Park, Seungeun Oh, Mehdi Bennis, Seong-Lyun Kim

TL;DR
This paper provides a comprehensive analysis of federated distillation (FD), highlighting its communication efficiency and versatility across tasks, through theoretical insights and practical implementations in distributed learning.
Contribution
It offers a novel asymptotic analysis of FD algorithms using neural tangent kernel theory and demonstrates FD's effectiveness in various distributed learning scenarios.
Findings
FD reduces communication costs compared to traditional federated learning.
FD achieves comparable accuracy with significantly less data exchange.
FD is applicable to classification, wireless channels, and reinforcement learning environments.
Abstract
Distributed learning frameworks often rely on exchanging model parameters across workers, instead of revealing their raw data. A prime example is federated learning that exchanges the gradients or weights of each neural network model. Under limited communication resources, however, such a method becomes extremely costly particularly for modern deep neural networks having a huge number of model parameters. In this regard, federated distillation (FD) is a compelling distributed learning solution that only exchanges the model outputs whose dimensions are commonly much smaller than the model sizes (e.g., 10 labels in the MNIST dataset). The goal of this chapter is to provide a deep understanding of FD while demonstrating its communication efficiency and applicability to a variety of tasks. To this end, towards demystifying the operational principle of FD, the first part of this chapter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
MethodsKnowledge Distillation
