Making Language Models Robust Against Negation

MohammadHossein Rezaei; Eduardo Blanco

arXiv:2502.07717·cs.CL·February 12, 2025

Making Language Models Robust Against Negation

MohammadHossein Rezaei, Eduardo Blanco

PDF

Open Access

TL;DR

This paper introduces a self-supervised training method with new tasks to improve language models' understanding of negation, significantly enhancing their performance on negation-related benchmarks and reasoning tasks.

Contribution

It proposes the NSPP task and a variation of NSP for pre-training, leading to more negation-robust language models like BERT and RoBERTa.

Findings

01

Improved performance on nine negation benchmarks

02

Up to 9.1% accuracy gain on CondaQA

03

Enhanced reasoning over negation in language models

Abstract

Negation has been a long-standing challenge for language models. Previous studies have shown that they struggle with negation in many natural language understanding tasks. In this work, we propose a self-supervised method to make language models more robust against negation. We introduce a novel task, Next Sentence Polarity Prediction (NSPP), and a variation of the Next Sentence Prediction (NSP) task. We show that BERT and RoBERTa further pre-trained on our tasks outperform the off-the-shelf versions on nine negation-related benchmarks. Most notably, our pre-training tasks yield between 1.8% and 9.1% improvement on CondaQA, a large question-answering corpus requiring reasoning over negation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Softmax · Linear Warmup With Linear Decay · Dropout · Weight Decay · WordPiece · Attention Dropout · Layer Normalization