Decentralized Learning with Multi-Headed Distillation
Andrey Zhmoginov, Mark Sandler, Nolan Miller, Gus Kristiansen, and Max Vladymyrov

TL;DR
This paper introduces a communication-efficient decentralized learning method using multi-headed distillation that enables agents with private, non-iid data to collaboratively improve their models without sharing raw data or weights.
Contribution
It presents a novel distillation-based approach with multiple auxiliary heads, enhancing training efficiency and performance in heterogeneous decentralized data settings.
Findings
Agents significantly outperform isolated learning.
The method is robust to data and model heterogeneity.
Communication efficiency is improved through distillation.
Abstract
Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxiliary heads for each client, greatly improving training efficiency in the case of heterogeneous data. This approach allows individual models to preserve and enhance performance on their private tasks while also dramatically improving their performance on the global aggregated data distribution. We study the effects of data and model architecture heterogeneity and the impact of the underlying communication graph topology on learning efficiency and show that our agents can significantly improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Distributed Sensor Networks and Detection Algorithms
