FusionDP: Foundation Model-Assisted Differentially Private Learning for Partially Sensitive Features
Linghui Zeng, Ruixuan Liu, Atiquer Rahman Sarkar, Xiaoqian Jiang, Joyce C. Ho, Li Xiong

TL;DR
FusionDP introduces a two-step method combining foundation model-based imputation and a modified DP-SGD to improve privacy-preserving learning when only some features are sensitive, achieving better utility without compromising privacy.
Contribution
It proposes a novel framework that leverages foundation models for sensitive feature imputation and a tailored DP-SGD algorithm for feature-level differential privacy.
Findings
Significant utility improvement over baseline methods.
Effective privacy preservation for sensitive features.
Validated on clinical and tabular datasets.
Abstract
Ensuring the privacy of sensitive training data is crucial in privacy-preserving machine learning. However, in practical scenarios, privacy protection may be required for only a subset of features. For instance, in ICU data, demographic attributes like age and gender pose higher privacy risks due to their re-identification potential, whereas raw lab results are generally less sensitive. Traditional DP-SGD enforces privacy protection on all features in one sample, leading to excessive noise injection and significant utility degradation. We propose FusionDP, a two-step framework that enhances model utility under feature-level differential privacy. First, FusionDP leverages large foundation models to impute sensitive features given non-sensitive features, treating them as external priors that provide high-quality estimates of sensitive attributes without accessing the true values during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare
