TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack
Yu Cao, Dianqi Li, Meng Fang, Tianyi Zhou, Jun Gao, Yibing Zhan,, Dacheng Tao

TL;DR
TASA is an adversarial attack method that generates fluent, grammatical contexts to deceive question answering models by exploiting their reliance on keyword matching and contextual biases, effectively reducing model accuracy.
Contribution
This work introduces TASA, a novel adversarial attack specifically designed for QA models, demonstrating improved attack effectiveness while maintaining context quality.
Findings
TASA outperforms existing textual attack methods in effectiveness.
QA models mainly rely on keyword matching and ignore contextual relations.
TASA successfully deceives models across five datasets with human-evaluated quality.
Abstract
We present Twin Answer Sentences Attack (TASA), an adversarial attack method for question answering (QA) models that produces fluent and grammatical adversarial contexts while maintaining gold answers. Despite phenomenal progress on general adversarial attacks, few works have investigated the vulnerability and attack specifically for QA models. In this work, we first explore the biases in the existing models and discover that they mainly rely on keyword matching between the question and context, and ignore the relevant contextual relations for answer prediction. Based on two biases above, TASA attacks the target model in two folds: (1) lowering the model's confidence on the gold answer with a perturbed answer sentence; (2) misguiding the model towards a wrong answer with a distracting answer sentence. Equipped with designed beam search and filtering methods, TASA can generate more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
