Adversarial Semantic Collisions

Congzheng Song; Alexander M. Rush; Vitaly Shmatikov

arXiv:2011.04743·cs.CL·November 11, 2020·1 cites

Adversarial Semantic Collisions

Congzheng Song, Alexander M. Rush, Vitaly Shmatikov

PDF

Open Access 1 Repo

TL;DR

This paper investigates semantic collisions in NLP models, showing how adversarially crafted texts can deceive models across various tasks, and proposes gradient-based methods to generate such collisions that evade filtering.

Contribution

It introduces gradient-based techniques for creating semantic collisions and demonstrates their effectiveness against state-of-the-art NLP models across multiple tasks.

Findings

01

Semantic collisions can significantly alter retrieval rankings.

02

State-of-the-art models are vulnerable to adversarial semantic collisions.

03

Proposed methods generate collisions that evade perplexity filters.

Abstract

We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models. We develop gradient-based approaches for generating semantic collisions and demonstrate that state-of-the-art models for many tasks which rely on analyzing the meaning and similarity of texts-- including paraphrase identification, document retrieval, response suggestion, and extractive summarization-- are vulnerable to semantic collisions. For example, given a target query, inserting a crafted collision into an irrelevant document can shift its retrieval rank from 1000 to top 3. We show how to generate semantic collisions that evade perplexity-based filtering and discuss other potential mitigations. Our code is available at https://github.com/csong27/collision-bert.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csong27/collision-bert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning