Membership Inference Attack Susceptibility of Clinical Language Models

Abhyuday Jagannatha; Bhanu Pratap Singh Rawat; Hong Yu

arXiv:2104.08305·cs.CL·April 20, 2021·20 cites

Membership Inference Attack Susceptibility of Clinical Language Models

Abhyuday Jagannatha, Bhanu Pratap Singh Rawat, Hong Yu

PDF

Open Access

TL;DR

This paper investigates privacy risks in clinical language models, demonstrating that membership inference attacks can reveal sensitive training data, with smaller and masked models showing lower leakage, and differential privacy improving privacy without sacrificing utility.

Contribution

It introduces and evaluates membership inference attacks on clinical language models, highlighting privacy vulnerabilities and proposing differential privacy as a mitigation strategy.

Findings

01

Membership inference attacks can leak up to 7% of training data.

02

Smaller models have lower privacy leakages than larger ones.

03

Differential privacy improves privacy while maintaining model utility.

Abstract

Deep Neural Network (DNN) models have been shown to have high empirical privacy leakages. Clinical language models (CLMs) trained on clinical data have been used to improve performance in biomedical natural language processing tasks. In this work, we investigate the risks of training-data leakage through white-box or black-box access to CLMs. We design and employ membership inference attacks to estimate the empirical privacy leaks for model architectures like BERT and GPT2. We show that membership inference attacks on CLMs lead to non-trivial privacy leakages of up to 7%. Our results show that smaller models have lower empirical privacy leakages than larger ones, and masked LMs have lower leakages than auto-regressive LMs. We further show that differentially private CLMs can have improved model utility on clinical domain while ensuring low empirical privacy leakage. Lastly, we also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare

MethodsMulti-Head Attention · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Adam · Dense Connections · Attention Is All You Need · Softmax · Linear Warmup With Linear Decay · WordPiece