Privacy-Preserving Statistical Data Generation: Application to Sepsis   Detection

Eric Macias-Fassio; Aythami Morales; Cristina Pruenza; Julian; Fierrez

arXiv:2404.16638·cs.LG·April 26, 2024

Privacy-Preserving Statistical Data Generation: Application to Sepsis Detection

Eric Macias-Fassio, Aythami Morales, Cristina Pruenza, Julian, Fierrez

PDF

Open Access

TL;DR

This paper introduces a statistical method for generating synthetic biomedical data, specifically for sepsis detection, balancing privacy and utility, and compares it to existing techniques to enhance data sharing under regulations.

Contribution

It presents a novel application of KDE-KNN for synthetic data generation in sensitive biomedical classification tasks, demonstrating its advantages over current methods.

Findings

01

KDE-KNN effectively balances data utility and privacy.

02

Synthetic data improves model training in sepsis detection.

03

KDE-KNN outperforms existing synthetic data methods.

Abstract

The biomedical field is among the sectors most impacted by the increasing regulation of Artificial Intelligence (AI) and data protection legislation, given the sensitivity of patient information. However, the rise of synthetic data generation methods offers a promising opportunity for data-driven technologies. In this study, we propose a statistical approach for synthetic data generation applicable in classification problems. We assess the utility and privacy implications of synthetic data generated by Kernel Density Estimator and K-Nearest Neighbors sampling (KDE-KNN) within a real-world context, specifically focusing on its application in sepsis detection. The detection of sepsis is a critical challenge in clinical practice due to its rapid progression and potentially life-threatening consequences. Moreover, we emphasize the benefits of KDE-KNN compared to current synthetic data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare · Imbalanced Data Classification Techniques · Machine Learning and Data Classification