Augmenting NLP data to counter Annotation Artifacts for NLI Tasks

Armaan Singh Bhullar

arXiv:2302.04700·cs.CL·February 10, 2023

Augmenting NLP data to counter Annotation Artifacts for NLI Tasks

Armaan Singh Bhullar

PDF

Open Access

TL;DR

This paper investigates annotation artifacts in NLP datasets, particularly in NLI tasks, and proposes a data augmentation method to reduce bias and improve model robustness.

Contribution

It identifies biases caused by annotation artifacts in NLI datasets and introduces a data augmentation approach to mitigate these biases and enhance model performance.

Findings

01

Data augmentation reduces reliance on annotation artifacts.

02

Models trained with augmented data perform better on adversarial examples.

03

Bias mitigation improves generalization in NLI tasks.

Abstract

In this paper, we explore Annotation Artifacts - the phenomena wherein large pre-trained NLP models achieve high performance on benchmark datasets but do not actually "solve" the underlying task and instead rely on some dataset artifacts (same across train, validation, and test sets) to figure out the right answer. We explore this phenomenon on the well-known Natural Language Inference task by first using contrast and adversarial examples to understand limitations to the model's performance and show one of the biases arising from annotation artifacts (the way training data was constructed by the annotators). We then propose a data augmentation technique to fix this bias and measure its effectiveness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning

MethodsTest