One-Shot Collaborative Data Distillation
William Holland, Chandra Thapa, Sarah Ali Siddiqui, Wei Shao, and, Seyit Camtepe

TL;DR
This paper introduces CollabDM, a collaborative data distillation method that efficiently captures global data distribution in distributed environments with minimal communication, improving synthetic data quality for machine learning tasks.
Contribution
The paper presents the first collaborative data distillation technique that outperforms existing methods by effectively handling data heterogeneity with only one communication round.
Findings
Outperforms state-of-the-art one-shot learning on skewed data
Requires only a single communication round
Shows practical benefits in 5G attack detection
Abstract
Large machine-learning training datasets can be distilled into small collections of informative synthetic data samples. These synthetic sets support efficient model learning and reduce the communication cost of data sharing. Thus, high-fidelity distilled data can support the efficient deployment of machine learning applications in distributed network environments. A naive way to construct a synthetic set in a distributed environment is to allow each client to perform local data distillation and to merge local distillations at a central server. However, the quality of the resulting set is impaired by heterogeneity in the distributions of the local data held by clients. To overcome this challenge, we introduce the first collaborative data distillation technique, called CollabDM, which captures the global distribution of the data and requires only a single round of communication between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence · Scientific Computing and Data Management
MethodsSparse Evolutionary Training
