Federated Learning Empowered by Generative Content
Rui Ye, Xinyu Zhu, Jingyi Chai, Siheng Chen, Yanfeng Wang

TL;DR
This paper introduces FedGC, a simple federated learning framework that uses generative content to diversify private data, improving model performance and privacy in heterogeneous data scenarios.
Contribution
FedGC is a novel, easy-to-implement framework that enhances federated learning by integrating generative data, addressing data heterogeneity issues effectively.
Findings
FedGC significantly improves FL performance across various datasets and scenarios.
Generative data helps mitigate data bias and heterogeneity in FL.
FedGC maintains privacy while boosting model accuracy.
Abstract
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way. However, data heterogeneity significantly limits the performance of current FL methods. In this paper, we propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content. FedGC is a simple-to-implement framework as it only introduces a one-shot step of data generation. In data generation, we summarize three crucial and worth-exploring aspects (budget allocation, prompt design, and generation guidance) and propose three solution candidates for each aspect. Specifically, to achieve a better trade-off between data diversity and fidelity for generation guidance, we propose to generate data based on the guidance of prompts and real data simultaneously. The generated data is then merged with private…
Peer Reviews
Decision·Submitted to ICLR 2024
1. It is interesting to utilize the power of foundational models to assist federated learning. 2. The experiment and ablation study are detailed.
1. The author does not consider the generation cost in the paper. The Stable Diffusion model needs at least 4.2GB space to deploy locally, and the memory consumption of generation is huge for the IoT or cross-device FL setup. For the black-box foundational models such as ChatGPT, the prompts directly leak the data privacy to the server of ChatGPT. As a result, both methods do not fit the FL setups. 2. The mixed-up training of synthetic and real private data directly increases the computational
- Thorough experiments that investigate the algorithmic choices and how federated learning is affected under new generative data. - Mitigating the issues due to data heterogeneity in FL is an important question, and exploring how large generative models could alleviate such issues is a timely and vital direction.
- On the one hand, the authors show that increasing the amount of generative data always increases the learning performance (Table 2 and Figure 2). On the other hand, the authors also show that FL on only generative contents (no private data) performs poorly in Table 5. This seems counterintuitive. Could the authors explain why? - If we allow each user to train a local model on its private data combined with generative contents, would the performance be comparable to FL training on private dat
The flow of the paper is clear, straightforward and easy to read. The motivation of the problem is exciting.
There are a few issues in the paper. 1. Other than considerations regarding generating data locally for each client (i.e. the four aspects at generating the data above), there is no significant theoretical contribution. There is no theorem, no proposal. Not a single equation is found in the paper. 2. The paper solely focuses on data generation for local training. However, there is nothing in the communication among the clients that carries any information about data generation from one client
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
