Concealed Data Poisoning Attacks on NLP Models

Eric Wallace; Tony Z. Zhao; Shi Feng; Sameer Singh

arXiv:2010.12563·cs.CL·April 13, 2021

Concealed Data Poisoning Attacks on NLP Models

Eric Wallace, Tony Z. Zhao, Shi Feng, Sameer Singh

PDF

TL;DR

This paper introduces a novel data poisoning attack on NLP models that manipulates predictions based on concealed training data modifications, demonstrating effectiveness across sentiment analysis, language modeling, and translation tasks.

Contribution

The authors develop a gradient-based poisoning method that embeds triggers without explicit mention, and propose defenses to mitigate such attacks in NLP models.

Findings

01

Poisoned models predict positively with trigger phrases like 'James Bond'

02

Language models can be manipulated to generate negative outputs with hidden triggers

03

Translation errors can be induced using concealed poisoning techniques

Abstract

Adversarial attacks alter NLP model predictions by perturbing test-time inputs. However, it is much less understood whether, and how, predictions can be manipulated with small, concealed changes to the training data. In this work, we develop a new data poisoning attack that allows an adversary to control model predictions whenever a desired trigger phrase is present in the input. For instance, we insert 50 poison examples into a sentiment model's training set that causes the model to frequently predict Positive whenever the input contains "James Bond". Crucially, we craft these poison examples using a gradient-based procedure so that they do not mention the trigger phrase. We also apply our poison attack to language modeling ("Apple iPhone" triggers negative generations) and machine translation ("iced coffee" mistranslated as "hot coffee"). We conclude by proposing three defenses that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.