Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding

Yeonkyoung So; Gyuseong Lee; Sungmok Jung; Joonhak Lee; JiA Kang; Sangho Kim; Jaejin Lee

arXiv:2506.14397·cs.CL·April 21, 2026

Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding

Yeonkyoung So, Gyuseong Lee, Sungmok Jung, Joonhak Lee, JiA Kang, Sangho Kim, Jaejin Lee

PDF

TL;DR

Thunder-NUBench is a new benchmark designed to specifically evaluate large language models' understanding of sentence-level negation, addressing gaps in existing evaluation methods.

Contribution

It introduces a comprehensive, manually curated dataset contrasting various negation structures to better assess LLMs' semantic comprehension of negation.

Findings

01

Benchmark reveals LLMs' struggles with diverse negation forms.

02

Contrasts standard negation with local negation, contradiction, and paraphrase.

03

Provides a detailed evaluation framework for negation understanding.

Abstract

Negation is a fundamental linguistic phenomenon that poses ongoing challenges for Large Language Models (LLMs), particularly in tasks requiring deep semantic understanding. Current benchmarks often treat negation as a minor detail within broader tasks, such as natural language inference. Consequently, there is a lack of benchmarks specifically designed to evaluate comprehension of negation. In this work, we introduce Thunder-NUBench, a novel benchmark explicitly created to assess sentence-level understanding of negation in LLMs. Thunder-NUBench goes beyond merely identifying surface-level cues by contrasting standard negation with structurally diverse alternatives, such as local negation, contradiction, and paraphrase. This benchmark includes manually curated sentence-negation pairs and a multiple-choice dataset, allowing for a comprehensive evaluation of models' understanding of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.