Synthetic Tabular Data Generation Under Horizontal Federated Learning Environments in Acute Myeloid Leukemia: Case-Based Simulation Study
Imanol Isasa, Mikel Catalina, Gorka Epelde, Naiara Aginako, Andoni Beristain

TL;DR
This study examines how combining synthetic data generation with federated learning affects data quality and privacy in acute myeloid leukemia research.
Contribution
The novel contribution is evaluating the impact of horizontally federating synthetic data generation models in rare disease contexts.
Findings
Federating SDG models caused up to 62% fidelity loss in FedTabDiff and 21% in GANs.
Privacy metrics remained stable despite federation, with up to 55% improvement in some cases.
Fidelity degradation was not significantly worse with more nodes or imbalanced data.
Abstract
Data scarcity and dispersion pose significant obstacles in biomedical research, particularly when addressing rare diseases. In such scenarios, synthetic data generation (SDG) has emerged as a promising path to mitigate the first issue. Concurrently, federated learning is a machine learning paradigm where multiple nodes collaborate to create a centralized model with knowledge that is distilled from the data in different nodes, but without the need for sharing it. This research explores the combination of SDG and federated learning technologies in the context of acute myeloid leukemia, a rare hematological disorder, evaluating their combined impact and the quality of the generated artificial datasets. This study aims to evaluate the privacy- and fidelity-related impact of horizontally federating SDG models in different data distribution scenarios and with different numbers of nodes,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcute Myeloid Leukemia Research · Chronic Lymphocytic Leukemia Research · Cancer Genomics and Diagnostics
