MeasHalu: Mitigation of Scientific Measurement Hallucinations for Large Language Models with Enhanced Reasoning
Ruijun Huang, Zhiqiao Kang, Yuxuan Zhu, Junxiong Li, Jiahao Zhao, Minghuan Tan, Feng Jiang, Min Yang

TL;DR
MeasHalu is a framework that reduces scientific measurement hallucinations in LLMs by enhanced reasoning, targeted fine-tuning, and a reward curriculum, improving accuracy in scientific literature extraction.
Contribution
It introduces a taxonomy of measurement hallucinations and a novel two-stage fine-tuning and reward strategy to mitigate errors in scientific measurement extraction.
Findings
Significantly reduces hallucination rates in scientific measurement extraction.
Improves accuracy on the MeasEval benchmark.
Provides a taxonomy and targeted mitigation strategies for measurement hallucinations.
Abstract
The accurate extraction of scientific measurements from literature is a critical yet challenging task in AI4Science, enabling large-scale analysis and integration of quantitative research findings. However, Large Language Models (LLMs) frequently exhibit severe hallucinations, which significantly undermine the reliability of automated scientific document understanding systems. To address this problem, we propose MeasHalu, a novel framework for mitigating scientific measurement hallucinations through enhanced reasoning and targeted optimization. We first present a fine-grained taxonomy of measurement-specific hallucinations, categorizing errors across quantities, units, modifiers, and relations. Our approach incorporates a two-stage reasoning-aware fine-tuning strategy using augmented scientific data and process-based supervision. Furthermore, we introduce a progressive reward curriculum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
