How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for   Hallucination in LLM-based Molecular Comprehension

Hao Li; Liuzhenghao Lv; He Cao; Zijing Liu; Zhiyuan Yan; Yu Wang,; Yonghong Tian; Yu Li; Li Yuan

arXiv:2504.12314·cs.CL·April 18, 2025·3 cites

How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension

Hao Li, Liuzhenghao Lv, He Cao, Zijing Liu, Zhiyuan Yan, Yu Wang,, Yonghong Tian, Yu Li, Li Yuan

PDF

Open Access

TL;DR

This paper introduces Mol-Hallu, a new metric for evaluating hallucination in LLMs for molecular understanding, and proposes HRPP to reduce hallucinations, enhancing model reliability in scientific tasks.

Contribution

It presents Mol-Hallu, a novel metric for assessing hallucination in molecular LLMs, and proposes HRPP, a post-processing method to mitigate hallucinations in these models.

Findings

01

Mol-Hallu effectively quantifies hallucination levels in molecular LLMs.

02

HRPP significantly reduces hallucination in both decoder-only and encoder-decoder models.

03

Analysis reveals knowledge shortcut phenomenon as a key source of hallucination.

Abstract

Large language models are increasingly used in scientific domains, especially for molecular understanding and analysis. However, existing models are affected by hallucination issues, resulting in errors in drug design and utilization. In this paper, we first analyze the sources of hallucination in LLMs for molecular comprehension tasks, specifically the knowledge shortcut phenomenon observed in the PubChem dataset. To evaluate hallucination in molecular comprehension tasks with computational efficiency, we introduce \textbf{Mol-Hallu}, a novel free-form evaluation metric that quantifies the degree of hallucination based on the scientific entailment relationship between generated text and actual molecular properties. Utilizing the Mol-Hallu metric, we reassess and analyze the extent of hallucination in various LLMs performing molecular comprehension tasks. Furthermore, the Hallucination…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Advanced Graph Neural Networks · Misinformation and Its Impacts