FLAIM: AIM-based Synthetic Data Generation in the Federated Setting
Samuel Maddock, Graham Cormode, Carsten Maple

TL;DR
This paper introduces FLAIM, a federated synthetic data generation method based on AIM, addressing privacy, heterogeneity, and efficiency challenges in distributed data environments.
Contribution
It extends AIM to federated settings, proposes FLAIM with heterogeneity management, and demonstrates improved utility and reduced overhead in simulations.
Findings
FLAIM outperforms naive AIM federation in utility.
Federated AIM can suffer utility degradation due to heterogeneity.
FLAIM reduces computational overhead compared to secure multi-party computation methods.
Abstract
Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We first show that it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in…
Peer Reviews
Decision·Submitted to ICLR 2024
The paper is clear and well written. The results all seem reasonable and correct. The privacy guarantees are rigorous.
The biggest question mark here is whether DP-SDG isgoing to be the practical answer in any situation, though this seems worth exploring anyway.
1) This paper suggests a new method for generating synthetic data in a Federated Learning setting while addressing the challenges of heterogeneity in federated settings. 2) After conducting a comprehensive assessment of the FLAIM technique on standard datasets, the authors compared its performance with other cutting-edge techniques. The results showed that the FLAIM method offers better efficiency with reduced overhead.
1) It remains a challenge to determine whether the FLAIM method would retain its efficiency when applied to real-world datasets that display more intricate structures and distributions, as its performance has been evaluated solely on benchmark datasets. 2) Although the paper compares the FLAIM method to other advanced methods, it does not give a complete comparison to all the related methods in the literature.
1. The authors identify the challenges in differentially private data synthesis with heterogeneous local data in the federated learning setting. 2. The authors propose two different algorithms for solving the challenge of differentially private data synthesis with heterogeneous data. 3. The proposed FLAIM solution on how to handle the heterogeneity in marginal selection is novel.
* Some key elements of the algorithm are not clearly motivated or explained, leaving the effectiveness of the algorithm unjustified. * Although it is acceptable that the DP data synthesis paper cannot provide a theoretical guarantee, some counter-intuitive phenomena in the experiments are not clearly explained. * The writing needs to be improved. - Speaking at the paper structure level, while the core idea of the paper should be relatively straightforward, the paper's organization may introdu
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
