FLAIM: AIM-based Synthetic Data Generation in the Federated Setting

Samuel Maddock; Graham Cormode; Carsten Maple

arXiv:2310.03447·cs.CR·September 6, 2024

FLAIM: AIM-based Synthetic Data Generation in the Federated Setting

Samuel Maddock, Graham Cormode, Carsten Maple

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces FLAIM, a federated synthetic data generation method based on AIM, addressing privacy, heterogeneity, and efficiency challenges in distributed data environments.

Contribution

It extends AIM to federated settings, proposes FLAIM with heterogeneity management, and demonstrates improved utility and reduced overhead in simulations.

Findings

01

FLAIM outperforms naive AIM federation in utility.

02

Federated AIM can suffer utility degradation due to heterogeneity.

03

FLAIM reduces computational overhead compared to secure multi-party computation methods.

Abstract

Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We first show that it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

The paper is clear and well written. The results all seem reasonable and correct. The privacy guarantees are rigorous.

Weaknesses

The biggest question mark here is whether DP-SDG isgoing to be the practical answer in any situation, though this seems worth exploring anyway.

Reviewer 02Rating 3· reject, not good enoughConfidence 5

Strengths

1) This paper suggests a new method for generating synthetic data in a Federated Learning setting while addressing the challenges of heterogeneity in federated settings. 2) After conducting a comprehensive assessment of the FLAIM technique on standard datasets, the authors compared its performance with other cutting-edge techniques. The results showed that the FLAIM method offers better efficiency with reduced overhead.

Weaknesses

1) It remains a challenge to determine whether the FLAIM method would retain its efficiency when applied to real-world datasets that display more intricate structures and distributions, as its performance has been evaluated solely on benchmark datasets. 2) Although the paper compares the FLAIM method to other advanced methods, it does not give a complete comparison to all the related methods in the literature.

Reviewer 03Rating 3· reject, not good enoughConfidence 3

Strengths

1. The authors identify the challenges in differentially private data synthesis with heterogeneous local data in the federated learning setting. 2. The authors propose two different algorithms for solving the challenge of differentially private data synthesis with heterogeneous data. 3. The proposed FLAIM solution on how to handle the heterogeneity in marginal selection is novel.

Weaknesses

* Some key elements of the algorithm are not clearly motivated or explained, leaving the effectiveness of the algorithm unjustified. * Although it is acceptable that the DP data synthesis paper cannot provide a theoretical guarantee, some counter-intuitive phenomena in the experiments are not clearly explained. * The writing needs to be improved. - Speaking at the paper structure level, while the core idea of the paper should be relatively straightforward, the paper's organization may introdu

Code & Models

Repositories

Samuel-Maddock/flaim
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data