Adversarial attacks against Fact Extraction and VERification
James Thorne, Andreas Vlachos

TL;DR
This paper introduces adversarial attacks on fact verification systems, revealing their vulnerability and emphasizing the need for more robust models in automated fact-checking tasks.
Contribution
It presents a simple method for generating adversarial perturbations that expose weaknesses in existing fact verification systems, and provides a baseline for FEVER2.0.
Findings
Systems' accuracy drops up to 29% on adversarial instances
Adversarial attacks significantly challenge current neural network models
Sample adversarial instances can be used to improve model robustness
Abstract
This paper describes a baseline for the second iteration of the Fact Extraction and VERification shared task (FEVER2.0) which explores the resilience of systems through adversarial evaluation. We present a collection of simple adversarial attacks against systems that participated in the first FEVER shared task. FEVER modeled the assessment of truthfulness of written claims as a joint information retrieval and natural language inference task using evidence from Wikipedia. A large number of participants made use of deep neural networks in their submissions to the shared task. The extent as to whether such models understand language has been the subject of a number of recent investigations and discussion in literature. In this paper, we present a simple method of generating entailment-preserving and entailment-altering perturbations of instances by common patterns within the training data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Forensic and Genetic Research · Topic Modeling
