HalluMat: Detecting Hallucinations in LLM-Generated Materials Science Content Through Multi-Stage Verification
Bhanu Prakash Vangala, Sajid Mahmud, Pawan Neupane, Joel Selvaraj, Jianlin Cheng

TL;DR
HalluMat introduces a benchmark and a multi-stage detection framework to identify and reduce hallucinations in LLM-generated materials science content, improving factual accuracy and reliability.
Contribution
The paper presents HalluMatData, a new benchmark dataset, and HalluMatDetector, a novel multi-stage hallucination detection framework for scientific content.
Findings
Hallucination levels vary across materials science subdomains.
HalluMatDetector reduces hallucination rates by 30%.
PHCS provides a new metric for assessing model reliability.
Abstract
Artificial Intelligence (AI), particularly Large Language Models (LLMs), is transforming scientific discovery, enabling rapid knowledge generation and hypothesis formulation. However, a critical challenge is hallucination, where LLMs generate factually incorrect or misleading information, compromising research integrity. To address this, we introduce HalluMatData, a benchmark dataset for evaluating hallucination detection methods, factual consistency, and response robustness in AI-generated materials science content. Alongside this, we propose HalluMatDetector, a multi-stage hallucination detection framework that integrates intrinsic verification, multi-source retrieval, contradiction graph analysis, and metric-based assessment to detect and mitigate LLM hallucinations. Our findings reveal that hallucination levels vary significantly across materials science subdomains, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Artificial Intelligence in Healthcare and Education
