Synthetic Data Aided Federated Learning Using Foundation Models

Fatima Abacha; Sin G. Teo; Lucas C. Cordeiro; Mustafa A. Mustafa

arXiv:2407.05174·cs.LG·July 9, 2024·2 cites

Synthetic Data Aided Federated Learning Using Foundation Models

Fatima Abacha, Sin G. Teo, Lucas C. Cordeiro, Mustafa A. Mustafa

PDF

Open Access

TL;DR

This paper introduces DPSDA-FL, a novel federated learning approach that uses differentially private synthetic data generated by foundation models to mitigate data heterogeneity issues in Non-IID scenarios, improving model performance.

Contribution

The paper proposes a new data augmentation method using foundation models to generate differentially private synthetic data for federated learning, addressing Non-IID data challenges.

Findings

01

Improves class recall by up to 26%.

02

Enhances classification accuracy by up to 9%.

03

Effective on CIFAR-10 dataset.

Abstract

In heterogeneous scenarios where the data distribution amongst the Federated Learning (FL) participants is Non-Independent and Identically distributed (Non-IID), FL suffers from the well known problem of data heterogeneity. This leads the performance of FL to be significantly degraded, as the global model tends to struggle to converge. To solve this problem, we propose Differentially Private Synthetic Data Aided Federated Learning Using Foundation Models (DPSDA-FL), a novel data augmentation strategy that aids in homogenizing the local data present on the clients' side. DPSDA-FL improves the training of the local models by leveraging differentially private synthetic data generated from foundation models. We demonstrate the effectiveness of our approach by evaluating it on the benchmark image dataset: CIFAR-10. Our experimental results have shown that DPSDA-FL can improve class recall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data