How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension
Hao Li, Liuzhenghao Lv, He Cao, Zijing Liu, Zhiyuan Yan, Yu Wang,, Yonghong Tian, Yu Li, Li Yuan

TL;DR
This paper introduces Mol-Hallu, a new metric for evaluating hallucination in LLMs for molecular understanding, and proposes HRPP to reduce hallucinations, enhancing model reliability in scientific tasks.
Contribution
It presents Mol-Hallu, a novel metric for assessing hallucination in molecular LLMs, and proposes HRPP, a post-processing method to mitigate hallucinations in these models.
Findings
Mol-Hallu effectively quantifies hallucination levels in molecular LLMs.
HRPP significantly reduces hallucination in both decoder-only and encoder-decoder models.
Analysis reveals knowledge shortcut phenomenon as a key source of hallucination.
Abstract
Large language models are increasingly used in scientific domains, especially for molecular understanding and analysis. However, existing models are affected by hallucination issues, resulting in errors in drug design and utilization. In this paper, we first analyze the sources of hallucination in LLMs for molecular comprehension tasks, specifically the knowledge shortcut phenomenon observed in the PubChem dataset. To evaluate hallucination in molecular comprehension tasks with computational efficiency, we introduce \textbf{Mol-Hallu}, a novel free-form evaluation metric that quantifies the degree of hallucination based on the scientific entailment relationship between generated text and actual molecular properties. Utilizing the Mol-Hallu metric, we reassess and analyze the extent of hallucination in various LLMs performing molecular comprehension tasks. Furthermore, the Hallucination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Advanced Graph Neural Networks · Misinformation and Its Impacts
