A Privacy-Preserving Unsupervised Domain Adaptation Framework for   Clinical Text Analysis

Qiyuan An; Ruijiang Li; Lin Gu; Hao Zhang; Qingyu Chen; Zhiyong Lu,; Fei Wang; and Yingying Zhu

arXiv:2201.07317·cs.CL·January 20, 2022·1 cites

A Privacy-Preserving Unsupervised Domain Adaptation Framework for Clinical Text Analysis

Qiyuan An, Ruijiang Li, Lin Gu, Hao Zhang, Qingyu Chen, Zhiyong Lu,, Fei Wang, and Yingying Zhu

PDF

Open Access

TL;DR

This paper introduces a privacy-preserving unsupervised domain adaptation framework for clinical text analysis that uses differential privacy and Gaussian Mixture Models to protect source data during adaptation, maintaining utility with minimal performance loss.

Contribution

It proposes a novel differential privacy training strategy combined with GMMs for privacy-preserving domain adaptation in clinical text analysis, addressing privacy risks while preserving task utility.

Findings

01

Effective privacy preservation with minor performance impact

02

Successful application on clinical text datasets

03

Maintains data utility under differential privacy constraints

Abstract

Unsupervised domain adaptation (UDA) generally aligns the unlabeled target domain data to the distribution of the source domain to mitigate the distribution shift problem. The standard UDA requires sharing the source data with the target, having potential data privacy leaking risks. To protect the source data's privacy, we first propose to share the source feature distribution instead of the source data. However, sharing only the source feature distribution may still suffer from the membership inference attack who can infer an individual's membership by the black-box access to the source model. To resolve this privacy issue, we further study the under-explored problem of privacy-preserving domain adaptation and propose a method with a novel differential privacy training strategy to protect the source data privacy. We model the source feature distribution by Gaussian Mixture Models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Computational and Text Analysis Methods · Domain Adaptation and Few-Shot Learning