Benchmarking Gaslighting Negation Attacks Against Reasoning Models
Bin Zhu, Hailong Yin, Jingjing Chen, and Yu-Gang Jiang

TL;DR
This paper systematically evaluates the vulnerability of state-of-the-art reasoning models to gaslighting negation attacks, revealing significant accuracy drops and introducing a new benchmark to measure and improve robustness against such manipulative prompts.
Contribution
It provides the first comprehensive assessment of reasoning models' susceptibility to gaslighting attacks and introduces GaslightingBench-R, a diagnostic benchmark for robustness evaluation.
Findings
Models experience 25-29% accuracy drops under attacks.
GaslightingBench-R causes over 53% accuracy decline.
Top reasoning models are vulnerable to adversarial negation prompts.
Abstract
Recent advances in reasoning-centric models promise improved robustness through mechanisms such as chain-of-thought prompting and test-time scaling. However, their ability to withstand gaslighting negation attacks-adversarial prompts that confidently deny correct answers-remains underexplored. In this paper, we conduct a systematic evaluation of three state-of-the-art reasoning models, i.e., OpenAI's o4-mini, Claude-3.7-Sonnet and Gemini-2.5-Flash, across three multimodal benchmarks: MMMU, MathVista, and CharXiv. Our evaluation reveals significant accuracy drops (25-29% on average) following gaslighting negation attacks, indicating that even top-tier reasoning models struggle to preserve correct answers under manipulative user feedback. Built upon the insights of the evaluation and to further probe this vulnerability, we introduce GaslightingBench-R, a new diagnostic benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
