One-Shot Collaborative Data Distillation

William Holland; Chandra Thapa; Sarah Ali Siddiqui; Wei Shao; and; Seyit Camtepe

arXiv:2408.02266·cs.LG·August 13, 2024

One-Shot Collaborative Data Distillation

William Holland, Chandra Thapa, Sarah Ali Siddiqui, Wei Shao, and, Seyit Camtepe

PDF

Open Access 1 Repo

TL;DR

This paper introduces CollabDM, a collaborative data distillation method that efficiently captures global data distribution in distributed environments with minimal communication, improving synthetic data quality for machine learning tasks.

Contribution

The paper presents the first collaborative data distillation technique that outperforms existing methods by effectively handling data heterogeneity with only one communication round.

Findings

01

Outperforms state-of-the-art one-shot learning on skewed data

02

Requires only a single communication round

03

Shows practical benefits in 5G attack detection

Abstract

Large machine-learning training datasets can be distilled into small collections of informative synthetic data samples. These synthetic sets support efficient model learning and reduce the communication cost of data sharing. Thus, high-fidelity distilled data can support the efficient deployment of machine learning applications in distributed network environments. A naive way to construct a synthetic set in a distributed environment is to allow each client to perform local data distillation and to merge local distillations at a central server. However, the quality of the resulting set is impaired by heterogeneity in the distributions of the local data held by clients. To overcome this challenge, we introduce the first collaborative data distillation technique, called CollabDM, which captures the global distribution of the data and requires only a single round of communication between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rayneholland/collabdm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence · Scientific Computing and Data Management

MethodsSparse Evolutionary Training