What About the Scene with the Hitler Reference? HAUNT: A Framework to Probe LLMs' Self-consistency Via Adversarial Nudge

Arka Dutta; Sujan Dutta; Rijul Magu; Soumyajit Datta; Munmun De Choudhury; Ashiqur R. KhudaBukhsh

arXiv:2511.08596·cs.CL·November 13, 2025

What About the Scene with the Hitler Reference? HAUNT: A Framework to Probe LLMs' Self-consistency Via Adversarial Nudge

Arka Dutta, Sujan Dutta, Rijul Magu, Soumyajit Datta, Munmun De Choudhury, Ashiqur R. KhudaBukhsh

PDF

Open Access

TL;DR

This paper introduces HAUNT, a framework to evaluate LLMs' self-consistency and factual fidelity under adversarial nudges, revealing varying resilience levels across different models in high-stakes information domains.

Contribution

The paper proposes a novel three-step stress testing framework for assessing LLMs' robustness to adversarial prompts in factual verification tasks.

Findings

01

Claude shows strong resilience to adversarial nudges.

02

GPT and Grok demonstrate moderate resilience.

03

Gemini and DeepSeek exhibit weak resilience.

Abstract

Hallucinations pose a critical challenge to the real-world deployment of large language models (LLMs) in high-stakes domains. In this paper, we present a framework for stress testing factual fidelity in LLMs in the presence of adversarial nudge. Our framework consists of three steps. In the first step, we instruct the LLM to produce sets of truths and lies consistent with the closed domain in question. In the next step, we instruct the LLM to verify the same set of assertions as truths and lies consistent with the same closed domain. In the final step, we test the robustness of the LLM against the lies generated (and verified) by itself. Our extensive evaluation, conducted using five widely known proprietary LLMs across two closed domains of popular movies and novels, reveals a wide range of susceptibility to adversarial nudges: \texttt{Claude} exhibits strong resilience, \texttt{GPT}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Misinformation and Its Impacts · Topic Modeling