Generating Label Cohesive and Well-Formed Adversarial Claims

Pepa Atanasova; Dustin Wright; and Isabelle Augenstein

arXiv:2009.08205·cs.CL·September 18, 2020

Generating Label Cohesive and Well-Formed Adversarial Claims

Pepa Atanasova, Dustin Wright, and Isabelle Augenstein

PDF

Open Access 1 Repo

TL;DR

This paper develops a method to generate adversarial claims for fact checking systems that preserve semantic meaning and validity, improving attack realism and effectiveness.

Contribution

It introduces a joint optimization approach and a conditional language model to produce more semantically coherent adversarial claims for fact checking.

Findings

01

Generated attacks better preserve claim semantics.

02

Attacks maintain directionality more effectively.

03

Improved adversarial attack quality over previous methods.

Abstract

Adversarial attacks reveal important vulnerabilities and flaws of trained models. One potent type of attack are universal adversarial triggers, which are individual n-grams that, when appended to instances of a class under attack, can trick a model into predicting a target class. However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances they are inserted in. In addition, such attacks produce semantically nonsensical inputs, as they simply concatenate triggers to existing samples. Here, we investigate how to generate adversarial attacks against fact checking systems that preserve the ground truth meaning and are semantically valid. We extend the HotFlip attack algorithm used for universal trigger generation by jointly minimising the target class loss of a fact checking model and the entailment class loss of an auxiliary natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

copenlu/fever-adversarial-attacks
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)