Don't Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load
Logan Mann, Nayan Saxena, Sarah Tandon, Chenhao Sun, Savar Toteja, Kevin Zhu

TL;DR
This paper investigates how large language models experience ironic rebound when asked to suppress concepts, revealing that suppression often backfires especially with semantic distractors, and identifies neural mechanisms underlying this phenomenon.
Contribution
It introduces systematic experiments and a dataset to analyze ironic rebound in LLMs, linking cognitive phenomena with mechanistic neural insights.
Findings
Rebound occurs immediately after negation and worsens with semantic distractors.
Repetition of content supports better suppression.
Polarity separation predicts rebound persistence.
Abstract
Negation instructions such as 'do not mention ' can paradoxically increase the accessibility of in human thought, a phenomenon known as ironic rebound. Large language models (LLMs) face the same challenge: suppressing a concept requires internally activating it, which may prime rebound instead of avoidance. We investigated this tension with two experiments. \textbf{(1) Load \& content}: after a negation instruction, we vary distractor text (semantic, syntactic, repetition) and measure rebound strength. \textbf{(2) Polarity separation}: We test whether models distinguish neutral from negative framings of the same concept and whether this separation predicts rebound persistence. Results show that rebound consistently arises immediately after negation and intensifies with longer or semantic distractors, while repetition supports suppression. Stronger polarity separation correlates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Mind wandering and attention · Action Observation and Synchronization
