Poison Attacks against Text Datasets with Conditional Adversarially   Regularized Autoencoder

Alvin Chan; Yi Tay; Yew-Soon Ong; Aston Zhang

arXiv:2010.02684·cs.CL·October 7, 2020·5 cites

Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

Alvin Chan, Yi Tay, Yew-Soon Ong, Aston Zhang

PDF

Open Access 2 Repos

TL;DR

This paper reveals a significant security vulnerability in NLP models where a small amount of poisoned data, generated using a novel autoencoder method, can drastically alter model predictions, posing serious risks to text classification systems.

Contribution

It introduces a new backdoor poisoning attack method using a conditional adversarially regularized autoencoder to generate poisoned training samples in NLP.

Findings

01

Poisoned data as low as 1% can manipulate model predictions.

02

Attack success rates exceed 80% with injected poison signatures.

03

NLI and text classification systems are highly vulnerable to this attack.

Abstract

This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. More concretely, we present a 'backdoor poisoning' attack on NLP models. Our poisoning attack utilizes conditional adversarially regularized autoencoder (CARA) to generate poisoned training samples by poison injection in latent space. Just by adding 1% poisoned data, our experiments show that a victim BERT finetuned classifier's predictions can be steered to the poison target class with success rates of >80% when the input hypothesis is injected with the poison signature, demonstrating that NLI and text classification systems face a huge security risk.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Topic Modeling

MethodsLinear Layer · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Attention Is All You Need