Forgetting Private Textual Sequences in Language Models via   Leave-One-Out Ensemble

Zhe Liu; Ozlem Kalinli

arXiv:2309.16082·cs.CL·September 29, 2023

Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble

Zhe Liu, Ozlem Kalinli

PDF

Open Access

TL;DR

This paper introduces a leave-one-out ensemble method using multiple teacher models to efficiently unlearn specific private textual sequences from language models, improving privacy without extensive retraining.

Contribution

It proposes a novel leave-one-out ensemble approach with multiple teachers to unlearn targeted sequences, enhancing privacy-utility balance in language models.

Findings

01

Achieves better privacy-utility trade-offs than existing methods.

02

Effectively unlearns specific sequences from language models.

03

Demonstrates success on LibriSpeech and WikiText-103 datasets.

Abstract

Recent research has shown that language models have a tendency to memorize rare or unique token sequences in the training corpus. After deploying a model, practitioners might be asked to delete any personal information from the model by individuals' requests. Re-training the underlying model every time individuals would like to practice their rights to be forgotten is computationally expensive. We employ a teacher-student framework and propose a novel leave-one-out ensemble method to unlearn the targeted textual sequences that need to be forgotten from the model. In our approach, multiple teachers are trained on disjoint sets; for each targeted sequence to be removed, we exclude the teacher trained on the set containing this sequence and aggregate the predictions from remaining teachers to provide supervision during fine-tuning. Experiments on LibriSpeech and WikiText-103 datasets show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Topic Modeling · Machine Learning in Healthcare