Teaching a Language Model to Distinguish Between Similar Details using a   Small Adversarial Training Set

Chris Achard

arXiv:2410.23118·cs.CL·October 31, 2024

Teaching a Language Model to Distinguish Between Similar Details using a Small Adversarial Training Set

Chris Achard

PDF

Open Access

TL;DR

This paper demonstrates that fine-tuning a language model on a small, manually crafted adversarial training set significantly improves its ability to distinguish between similar details in natural language inference tasks, especially on challenging adversarial examples.

Contribution

The study introduces a targeted adversarial training approach with a small dataset to enhance language model robustness against similar word and phrase confusions.

Findings

01

13% accuracy increase on adversarial test set

02

Improved differentiation of similar words and phrases

03

Maintained high performance on original NLI tasks

Abstract

Language models can achieve high accuracy on natural language tasks such as NLI, but performance suffers on manually created adversarial examples. We investigate the performance of a language model trained on the Stanford Natural Language Inference (SNLI) corpus on a manually created adversarial test set. We then improve the model's performance by fine tuning the model on a small, manually created adversarial training set, designed to help the language model to learn to differentiate between similar words and phrases in the data. We show an increase in accuracy on the adversarial test set (+ 13%) while still maintaining good performance on the original NLI task. We also show an increase in accuracy from 91.2% to 92.9% on the most similar contradictions in the SNLI test set (as judged by cosine similarity).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques

MethodsSparse Evolutionary Training