Robust Machine Comprehension Models via Adversarial Training
Yicheng Wang, Mohit Bansal

TL;DR
This paper introduces AddSentDiverse, a novel adversarial training method that significantly enhances the robustness of machine comprehension models against adversarial attacks without sacrificing performance on standard datasets.
Contribution
The paper proposes AddSentDiverse, an effective adversarial data generation algorithm, and joint semantic-relationship learning to improve model robustness against adversarial perturbations.
Findings
36.5% increase in F1 score under adversarial evaluation
Maintains performance on standard SQuAD task
Significantly improves robustness against AddSent-based attacks
Abstract
It is shown that many published models for the Stanford Question Answering Dataset (Rajpurkar et al., 2016) lack robustness, suffering an over 50% decrease in F1 score during adversarial evaluation based on the AddSent (Jia and Liang, 2017) algorithm. It has also been shown that retraining models on data generated by AddSent has limited effect on their robustness. We propose a novel alternative adversary-generation algorithm, AddSentDiverse, that significantly increases the variance within the adversarial training data by providing effective examples that punish the model for making certain superficial assumptions. Further, in order to improve robustness to AddSent's semantic perturbations (e.g., antonyms), we jointly improve the model's semantic-relationship learning capabilities in addition to our AddSentDiverse-based adversarial training data augmentation. With these additions, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
