PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations
Yuhe Wu, Guangyu Wang, Yuran Chen, Jiatong Zhang, Yutong Zhang, Yujie Chen, Jiaming Shang, Guang Zhang, Zhuang Liu

TL;DR
PRISM is a diagnostic benchmark that dissects LLM hallucinations into four categories across generation stages, enabling detailed evaluation and understanding of hallucination sources in large language models.
Contribution
It introduces PRISM, a comprehensive, stage-aware benchmark with 9,448 instances across 65 tasks for fine-grained hallucination diagnosis in LLMs.
Findings
Uncovered trade-offs between instruction following, memory retrieval, and reasoning.
Mitigation strategies often improve some hallucination dimensions but worsen others.
PRISM enables understanding of specific mechanisms behind LLM hallucinations.
Abstract
As large language models (LLMs) evolve from conversational assistants into agents capable of handling complex tasks, they are increasingly deployed in high-risk domains. However, existing benchmarks largely rely on mixed queries and posterior evaluation, output-level scoring, which quantifies hallucination severity but offers limited insight into where and why hallucinations arise in the generation pipeline. We therefore reformulate hallucination evaluation as a diagnostic problem and propose PRISM, a controlled benchmark that disentangles hallucinations into four dimensions: knowledge missing, knowledge errors, reasoning errors, and instruction-following errors, grounded in three stages of generation (memory, instruction, and reasoning). PRISM contains 9,448 instances across 65 tasks and supports fine-grained, stage-aware diagnostic evaluation. Evaluating 24 mainstream open-source and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
