Benchmarking Gaslighting Negation Attacks Against Reasoning Models

Bin Zhu; Hailong Yin; Jingjing Chen; and Yu-Gang Jiang

arXiv:2506.09677·cs.CV·December 18, 2025

Benchmarking Gaslighting Negation Attacks Against Reasoning Models

Bin Zhu, Hailong Yin, Jingjing Chen, and Yu-Gang Jiang

PDF

Open Access

TL;DR

This paper systematically evaluates the vulnerability of state-of-the-art reasoning models to gaslighting negation attacks, revealing significant accuracy drops and introducing a new benchmark to measure and improve robustness against such manipulative prompts.

Contribution

It provides the first comprehensive assessment of reasoning models' susceptibility to gaslighting attacks and introduces GaslightingBench-R, a diagnostic benchmark for robustness evaluation.

Findings

01

Models experience 25-29% accuracy drops under attacks.

02

GaslightingBench-R causes over 53% accuracy decline.

03

Top reasoning models are vulnerable to adversarial negation prompts.

Abstract

Recent advances in reasoning-centric models promise improved robustness through mechanisms such as chain-of-thought prompting and test-time scaling. However, their ability to withstand gaslighting negation attacks-adversarial prompts that confidently deny correct answers-remains underexplored. In this paper, we conduct a systematic evaluation of three state-of-the-art reasoning models, i.e., OpenAI's o4-mini, Claude-3.7-Sonnet and Gemini-2.5-Flash, across three multimodal benchmarks: MMMU, MathVista, and CharXiv. Our evaluation reveals significant accuracy drops (25-29% on average) following gaslighting negation attacks, indicating that even top-tier reasoning models struggle to preserve correct answers under manipulative user feedback. Built upon the insights of the evaluation and to further probe this vulnerability, we introduce GaslightingBench-R, a new diagnostic benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI