Defending Our Privacy With Backdoors
Dominik Hintersdorf, Lukas Struppek, Daniel Neider, Kristian Kersting

TL;DR
This paper introduces a quick and effective backdoor-based method to remove sensitive personal information from vision-language models, enhancing privacy without extensive retraining.
Contribution
It presents a novel backdoor approach to selectively erase private data from models, offering a practical privacy defense with minimal fine-tuning.
Findings
Effective removal of sensitive info demonstrated on CLIP
Backdoor method requires only minutes of fine-tuning
Maintains model performance while enhancing privacy
Abstract
The proliferation of large AI models trained on uncurated, often sensitive web-scraped data has raised significant privacy concerns. One of the concerns is that adversaries can extract information about the training data using privacy attacks. Unfortunately, the task of removing specific information from the models without sacrificing performance is not straightforward and has proven to be challenging. We propose a rather easy yet effective defense based on backdoor attacks to remove private information, such as names and faces of individuals, from vision-language models by fine-tuning them for only a few minutes instead of re-training them from scratch. Specifically, by strategically inserting backdoors into text encoders, we align the embeddings of sensitive phrases with those of neutral terms-"a person" instead of the person's actual name. For image encoders, we map individuals'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
MethodsALIGN · Focus · Contrastive Language-Image Pre-training
