On the Efficacy of Adversarial Data Collection for Question Answering:   Results from a Large-Scale Randomized Study

Divyansh Kaushik; Douwe Kiela; Zachary C. Lipton; Wen-tau Yih

arXiv:2106.00872·cs.CL·June 3, 2021

On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study

Divyansh Kaushik, Douwe Kiela, Zachary C. Lipton, Wen-tau Yih

PDF

1 Repo

TL;DR

This large-scale study investigates whether adversarial data collection improves question answering model robustness, finding that it enhances performance on similar adversarial datasets but reduces out-of-domain generalization.

Contribution

The paper provides the first large-scale controlled comparison of adversarial versus standard data collection for question answering, revealing nuanced effects on model robustness.

Findings

01

Models trained on adversarial data perform better on similar adversarial datasets.

02

Adversarial training reduces performance on out-of-domain evaluation sets.

03

Qualitative analysis highlights key differences between adversarial and standard data.

Abstract

In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions. Researchers hope that models trained on these more challenging datasets will rely less on superficial patterns, and thus be less brittle. However, despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models. In this paper, we conduct a large-scale controlled study focused on question answering, assigning workers at random to compose questions either (i) adversarially (with a model in the loop); or (ii) in the standard fashion (without a model). Across a variety of models and datasets, we find that models trained on adversarial data usually perform better on other adversarial datasets but worse on a diverse collection of out-of-domain evaluation sets. Finally, we provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/aqa-study
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.