Strong hallucinations from negation and how to fix them
Nicholas Asher, Swarnadeep Bhar

TL;DR
This paper identifies that language models produce logically incoherent responses called strong hallucinations, especially with negation, and proposes a novel method treating negation as an operation over latent representations to improve reasoning.
Contribution
The paper introduces a new approach that treats negation as an operation over latent representations, reducing hallucinations without needing negative training data.
Findings
Improved performance on negation-related tasks
Reduced logical incoherence in model responses
Effective without additional negative data training
Abstract
Despite great performance on many tasks, language models (LMs) still struggle with reasoning, sometimes providing responses that cannot possibly be true because they stem from logical incoherence. We call such responses \textit{strong hallucinations} and prove that they follow from an LM's computation of its internal representations for logical operators and outputs from those representations. Focusing on negation, we provide a novel solution in which negation is treated not as another element of a latent representation, but as \textit{an operation over an LM's latent representations that constrains how they may evolve}. We show that our approach improves model performance in cloze prompting and natural language inference tasks with negation without requiring training on sparse negative data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHallucinations in medical conditions
