Strong hallucinations from negation and how to fix them

Nicholas Asher; Swarnadeep Bhar

arXiv:2402.10543·cs.CL·August 21, 2024·1 cites

Strong hallucinations from negation and how to fix them

Nicholas Asher, Swarnadeep Bhar

PDF

Open Access 1 Video

TL;DR

This paper identifies that language models produce logically incoherent responses called strong hallucinations, especially with negation, and proposes a novel method treating negation as an operation over latent representations to improve reasoning.

Contribution

The paper introduces a new approach that treats negation as an operation over latent representations, reducing hallucinations without needing negative training data.

Findings

01

Improved performance on negation-related tasks

02

Reduced logical incoherence in model responses

03

Effective without additional negative data training

Abstract

Despite great performance on many tasks, language models (LMs) still struggle with reasoning, sometimes providing responses that cannot possibly be true because they stem from logical incoherence. We call such responses \textit{strong hallucinations} and prove that they follow from an LM's computation of its internal representations for logical operators and outputs from those representations. Focusing on negation, we provide a novel solution in which negation is treated not as another element of a latent representation, but as \textit{an operation over an LM's latent representations that constrains how they may evolve}. We show that our approach improves model performance in cloze prompting and natural language inference tasks with negation without requiring training on sparse negative data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Strong hallucinations from negation and how to fix them· underline

Taxonomy

TopicsHallucinations in medical conditions