REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

Buyun Liang; Jinqi Luo; Liangzu Peng; Kwan Ho Ryan Chan; Darshan Thaker; Kaleab A. Kinfu; Fengrui Tian; Hamed Hassani; Ren\'e Vidal

arXiv:2605.12813·cs.CL·May 14, 2026

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

Buyun Liang, Jinqi Luo, Liangzu Peng, Kwan Ho Ryan Chan, Darshan Thaker, Kaleab A. Kinfu, Fengrui Tian, Hamed Hassani, Ren\'e Vidal

PDF

1 Repo

TL;DR

REALISTA introduces a novel latent-space attack framework that generates realistic, semantically coherent prompts to effectively elicit hallucinations in large language models, surpassing previous methods.

Contribution

It proposes a new method combining discrete and continuous prompt attacks via a semantic dictionary, improving realism and attack success on large language models.

Findings

01

REALISTA outperforms existing realistic attack methods.

02

It successfully attacks large reasoning models in free-form response settings.

03

The framework achieves comparable or superior results on open-source LLMs.

Abstract

Large language models (LLMs) achieve strong performance across many tasks but remain vulnerable to hallucinations, motivating the need for realistic adversarial prompts that elicit such failures. We formulate hallucination elicitation as a constrained optimization problem, where the goal is to find semantically coherent adversarial prompts that are equivalent to benign user prompts. Existing methods remain limited: discrete prompt-based attacks preserve semantic equivalence and coherence but search only over a limited set of prompt variations, while continuous latent-space attacks explore a richer space but often decode into prompts that are no longer valid rephrasings. To address these limitations, we propose REALISTA, a realistic latent-space attack framework. REALISTA constructs an input-dependent dictionary of valid editing directions, each corresponding to a semantically equivalent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Buyun-Liang/REALISTA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.