Unpacking the Resilience of SNLI Contradiction Examples to Attacks
Chetan Verma, Archit Agarwal

TL;DR
This paper investigates the robustness of SNLI contradiction examples against adversarial attacks, revealing their relative resilience and demonstrating that adversarial training can improve model robustness and reduce reliance on dataset biases.
Contribution
It introduces an analysis of SNLI contradiction examples' resilience to attacks and shows how adversarial training enhances model robustness and reduces bias reliance.
Findings
Contradiction class shows smaller accuracy decline under attack.
Adversarial training restores model performance to near-baseline levels.
Adversarial triggers help identify and mitigate dataset biases.
Abstract
Pre-trained models excel on NLI benchmarks like SNLI and MultiNLI, but their true language understanding remains uncertain. Models trained only on hypotheses and labels achieve high accuracy, indicating reliance on dataset biases and spurious correlations. To explore this issue, we applied the Universal Adversarial Attack to examine the model's vulnerabilities. Our analysis revealed substantial drops in accuracy for the entailment and neutral classes, whereas the contradiction class exhibited a smaller decline. Fine-tuning the model on an augmented dataset with adversarial examples restored its performance to near-baseline levels for both the standard and challenge sets. Our findings highlight the value of adversarial triggers in identifying spurious correlations and improving robustness while providing insights into the resilience of the contradiction class to adversarial attacks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Software System Performance and Reliability · Network Security and Intrusion Detection
