Augmenting NLP data to counter Annotation Artifacts for NLI Tasks
Armaan Singh Bhullar

TL;DR
This paper investigates annotation artifacts in NLP datasets, particularly in NLI tasks, and proposes a data augmentation method to reduce bias and improve model robustness.
Contribution
It identifies biases caused by annotation artifacts in NLI datasets and introduces a data augmentation approach to mitigate these biases and enhance model performance.
Findings
Data augmentation reduces reliance on annotation artifacts.
Models trained with augmented data perform better on adversarial examples.
Bias mitigation improves generalization in NLI tasks.
Abstract
In this paper, we explore Annotation Artifacts - the phenomena wherein large pre-trained NLP models achieve high performance on benchmark datasets but do not actually "solve" the underlying task and instead rely on some dataset artifacts (same across train, validation, and test sets) to figure out the right answer. We explore this phenomenon on the well-known Natural Language Inference task by first using contrast and adversarial examples to understand limitations to the model's performance and show one of the biases arising from annotation artifacts (the way training data was constructed by the annotators). We then propose a data augmentation technique to fix this bias and measure its effectiveness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning
MethodsTest
