IDT: Dual-Task Adversarial Attacks for Privacy Protection
Pedro Faustini, Shakila Mahjabin Tonni, Annabelle McIver, Qiongkai Xu,, Mark Dras

TL;DR
This paper introduces IDT, a novel adversarial attack method that rewrites text to hide sensitive attributes while preserving its original utility, outperforming existing privacy-preserving techniques in NLP.
Contribution
IDT is the first approach to adapt adversarial attacks for privacy protection in NLP, effectively balancing privacy and utility without generating overly different texts.
Findings
IDT outperforms existing methods in deceiving privacy classifiers.
IDT maintains high utility of the original text.
Both automatic and human evaluations confirm IDT's effectiveness.
Abstract
Natural language processing (NLP) models may leak private information in different ways, including membership inference, reconstruction or attribute inference attacks. Sensitive information may not be explicit in the text, but hidden in underlying writing characteristics. Methods to protect privacy can involve using representations inside models that are demonstrated not to detect sensitive attributes or -- for instance, in cases where users might not trust a model, the sort of scenario of interest here -- changing the raw text before models can have access to it. The goal is to rewrite text to prevent someone from inferring a sensitive attribute (e.g. the gender of the author, or their location by the writing style) whilst keeping the text useful for its original intention (e.g. the sentiment of a product review). The few works tackling this have focused on generative techniques.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques
