Multi-head attention debiasing and contrastive learning for mitigating Dataset Artifacts in Natural Language Inference
Karthik Sivakoti

TL;DR
This paper identifies dataset artifacts in NLI, analyzes their interactions, and proposes a multi-head debiasing method that significantly improves model robustness across various bias categories while maintaining overall performance.
Contribution
The paper introduces a novel multi-head debiasing architecture tailored to mitigate multiple dataset artifacts in NLI, backed by detailed artifact analysis and substantial empirical improvements.
Findings
Bias accuracy improved across all artifact categories.
Error rate reduced from 14.19% to 10.42%.
Enhanced handling of neutral relationships in NLI.
Abstract
While Natural Language Inference (NLI) models have achieved high performances on benchmark datasets, there are still concerns whether they truly capture the intended task, or largely exploit dataset artifacts. Through detailed analysis of the Stanford Natural Language Inference (SNLI) dataset, we have uncovered complex patterns of various types of artifacts and their interactions, leading to the development of our novel structural debiasing approach. Our fine-grained analysis of 9,782 validation examples reveals four major categories of artifacts: length-based patterns, lexical overlap, subset relationships, and negation patterns. Our multi-head debiasing architecture achieves substantial improvements across all bias categories: length bias accuracy improved from 86.03% to 90.06%, overlap bias from 91.88% to 93.13%, subset bias from 95.43% to 96.49%, and negation bias from 88.69% to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
