Federated Knowledge Distillation

Hyowoon Seo; Jihong Park; Seungeun Oh; Mehdi Bennis; Seong-Lyun Kim

arXiv:2011.02367·cs.LG·November 5, 2020·34 cites

Federated Knowledge Distillation

Hyowoon Seo, Jihong Park, Seungeun Oh, Mehdi Bennis, Seong-Lyun Kim

PDF

Open Access 4 Repos

TL;DR

This paper provides a comprehensive analysis of federated distillation (FD), highlighting its communication efficiency and versatility across tasks, through theoretical insights and practical implementations in distributed learning.

Contribution

It offers a novel asymptotic analysis of FD algorithms using neural tangent kernel theory and demonstrates FD's effectiveness in various distributed learning scenarios.

Findings

01

FD reduces communication costs compared to traditional federated learning.

02

FD achieves comparable accuracy with significantly less data exchange.

03

FD is applicable to classification, wireless channels, and reinforcement learning environments.

Abstract

Distributed learning frameworks often rely on exchanging model parameters across workers, instead of revealing their raw data. A prime example is federated learning that exchanges the gradients or weights of each neural network model. Under limited communication resources, however, such a method becomes extremely costly particularly for modern deep neural networks having a huge number of model parameters. In this regard, federated distillation (FD) is a compelling distributed learning solution that only exchanges the model outputs whose dimensions are commonly much smaller than the model sizes (e.g., 10 labels in the MNIST dataset). The goal of this chapter is to provide a deep understanding of FD while demonstrating its communication efficiency and applicability to a variety of tasks. To this end, towards demystifying the operational principle of FD, the first part of this chapter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Machine Learning and ELM

MethodsKnowledge Distillation