Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders
Yuan Xin, Zheng Li, Ning Yu, Dingfan Chen, Mario Fritz, Michael Backes, and Yang Zhang

TL;DR
This paper systematically investigates privacy risks, specifically membership data leakage, in pre-trained language encoders, revealing leakage even with black-box access and across various architectures and tasks.
Contribution
It is the first comprehensive study demonstrating membership leakage in pre-trained language models through black-box outputs, across multiple architectures and datasets.
Findings
Membership leakage exists even with only black-box model outputs.
Leakage is consistent across different encoder architectures and downstream tasks.
Provides insights for improving privacy protections in NLP models.
Abstract
Despite being prevalent in the general field of Natural Language Processing (NLP), pre-trained language models inherently carry privacy and copyright concerns due to their nature of training on large-scale web-scraped data. In this paper, we pioneer a systematic exploration of such risks associated with pre-trained language encoders, specifically focusing on the membership leakage of pre-training data exposed through downstream models adapted from pre-trained language encoders-an aspect largely overlooked in existing literature. Our study encompasses comprehensive experiments across four types of pre-trained encoder architectures, three representative downstream tasks, and five benchmark datasets. Intriguingly, our evaluations reveal, for the first time, the existence of membership leakage even when only the black-box output of the downstream model is exposed, highlighting a privacy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Context-Aware Activity Recognition Systems · Anomaly Detection Techniques and Applications
