Unpacking the Resilience of SNLI Contradiction Examples to Attacks

Chetan Verma; Archit Agarwal

arXiv:2412.11172·cs.CL·December 17, 2024

Unpacking the Resilience of SNLI Contradiction Examples to Attacks

Chetan Verma, Archit Agarwal

PDF

Open Access 1 Repo

TL;DR

This paper investigates the robustness of SNLI contradiction examples against adversarial attacks, revealing their relative resilience and demonstrating that adversarial training can improve model robustness and reduce reliance on dataset biases.

Contribution

It introduces an analysis of SNLI contradiction examples' resilience to attacks and shows how adversarial training enhances model robustness and reduces bias reliance.

Findings

01

Contradiction class shows smaller accuracy decline under attack.

02

Adversarial training restores model performance to near-baseline levels.

03

Adversarial triggers help identify and mitigate dataset biases.

Abstract

Pre-trained models excel on NLI benchmarks like SNLI and MultiNLI, but their true language understanding remains uncertain. Models trained only on hypotheses and labels achieve high accuracy, indicating reliance on dataset biases and spurious correlations. To explore this issue, we applied the Universal Adversarial Attack to examine the model's vulnerabilities. Our analysis revealed substantial drops in accuracy for the entailment and neutral classes, whereas the contradiction class exhibited a smaller decline. Fine-tuning the model on an augmented dataset with adversarial examples restored its performance to near-baseline levels for both the standard and challenge sets. Our findings highlight the value of adversarial triggers in identifying spurious correlations and improving robustness while providing insights into the resilience of the contradiction class to adversarial attacks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ckvermaai/snli-attack-analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Software System Performance and Reliability · Network Security and Intrusion Detection