Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder
Alvin Chan, Yi Tay, Yew-Soon Ong, Aston Zhang

TL;DR
This paper reveals a significant security vulnerability in NLP models where a small amount of poisoned data, generated using a novel autoencoder method, can drastically alter model predictions, posing serious risks to text classification systems.
Contribution
It introduces a new backdoor poisoning attack method using a conditional adversarially regularized autoencoder to generate poisoned training samples in NLP.
Findings
Poisoned data as low as 1% can manipulate model predictions.
Attack success rates exceed 80% with injected poison signatures.
NLI and text classification systems are highly vulnerable to this attack.
Abstract
This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. More concretely, we present a 'backdoor poisoning' attack on NLP models. Our poisoning attack utilizes conditional adversarially regularized autoencoder (CARA) to generate poisoned training samples by poison injection in latent space. Just by adding 1% poisoned data, our experiments show that a victim BERT finetuned classifier's predictions can be steered to the poison target class with success rates of >80% when the input hypothesis is injected with the poison signature, demonstrating that NLI and text classification systems face a huge security risk.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Topic Modeling
MethodsLinear Layer · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Attention Is All You Need
