Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models
Makoto Sato

TL;DR
This paper introduces a systematic prompt-based framework to trigger and measure hallucinations in large language models, revealing their vulnerability and variability across models, which is crucial for developing safer AI systems.
Contribution
The study presents a novel, reproducible method to induce and quantify hallucinations in LLMs, enabling better understanding and mitigation of their factual inaccuracies.
Findings
HIPs cause more hallucinations than control prompts
Hallucination effects vary across different LLMs
Reasoning-oriented models show different hallucination profiles
Abstract
Hallucinations in large language models (LLMs) present a growing challenge across real-world applications, from healthcare to law, where factual reliability is essential. Despite advances in alignment and instruction tuning, LLMs can still generate outputs that are fluent yet fundamentally untrue. Understanding the cognitive dynamics that underlie these hallucinations remains an open problem. In this study, we propose a prompt-based framework to systematically trigger and quantify hallucination: a Hallucination-Inducing Prompt (HIP), which synthetically fuses semantically distant concepts (e.g., periodic table of elements and tarot divination) in a misleading way, and a Hallucination Quantifying Prompt (HQP), which scores the plausibility, confidence, and coherence of the output. Controlled experiments across multiple LLMs revealed that HIPs consistently produced less coherent and more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
