A Privacy-Preserving Unsupervised Domain Adaptation Framework for Clinical Text Analysis
Qiyuan An, Ruijiang Li, Lin Gu, Hao Zhang, Qingyu Chen, Zhiyong Lu,, Fei Wang, and Yingying Zhu

TL;DR
This paper introduces a privacy-preserving unsupervised domain adaptation framework for clinical text analysis that uses differential privacy and Gaussian Mixture Models to protect source data during adaptation, maintaining utility with minimal performance loss.
Contribution
It proposes a novel differential privacy training strategy combined with GMMs for privacy-preserving domain adaptation in clinical text analysis, addressing privacy risks while preserving task utility.
Findings
Effective privacy preservation with minor performance impact
Successful application on clinical text datasets
Maintains data utility under differential privacy constraints
Abstract
Unsupervised domain adaptation (UDA) generally aligns the unlabeled target domain data to the distribution of the source domain to mitigate the distribution shift problem. The standard UDA requires sharing the source data with the target, having potential data privacy leaking risks. To protect the source data's privacy, we first propose to share the source feature distribution instead of the source data. However, sharing only the source feature distribution may still suffer from the membership inference attack who can infer an individual's membership by the black-box access to the source model. To resolve this privacy issue, we further study the under-explored problem of privacy-preserving domain adaptation and propose a method with a novel differential privacy training strategy to protect the source data privacy. We model the source feature distribution by Gaussian Mixture Models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Computational and Text Analysis Methods · Domain Adaptation and Few-Shot Learning
