Investigating Biases in Textual Entailment Datasets
Shawn Tan, Yikang Shen, Chin-wei Huang, Aaron Courville

TL;DR
This paper examines biases in textual entailment datasets like SNLI and MultiNLI, analyzing their impact on model performance and proposing methods to mitigate these biases for more reliable language understanding evaluation.
Contribution
It provides a detailed analysis of dataset biases in textual entailment and introduces a simple approach to reduce these biases, improving dataset quality.
Findings
Classifying hypotheses alone achieves 64% accuracy on SNLI.
Biases significantly influence model performance.
Proposed bias reduction method decreases dataset biases.
Abstract
The ability to understand logical relationships between sentences is an important task in language understanding. To aid in progress for this task, researchers have collected datasets for machine learning and evaluation of current systems. However, like in the crowdsourced Visual Question Answering (VQA) task, some biases in the data inevitably occur. In our experiments, we find that performing classification on just the hypotheses on the SNLI dataset yields an accuracy of 64%. We analyze the bias extent in the SNLI and the MultiNLI dataset, discuss its implication, and propose a simple method to reduce the biases in the datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
