Can Hallucinations Help? Boosting LLMs for Drug Discovery
Shuzhou Yuan, Zhan Qu, Ashish Yashwanth Kangen, Michael F\"arber

TL;DR
This paper explores how hallucinations in large language models can be harnessed to improve molecule property prediction in drug discovery, challenging the view that hallucinations are solely problematic.
Contribution
It demonstrates that hallucinations can enhance predictive accuracy in drug discovery tasks and identifies the types of hallucinations that are most beneficial.
Findings
Hallucinations significantly improve model accuracy in some cases.
Larger models benefit more from hallucinations.
Structural misdescriptions are the most impactful hallucination type.
Abstract
Hallucinations in large language models (LLMs), plausible but factually inaccurate text, are often viewed as undesirable. However, recent work suggests that such outputs may hold creative potential. In this paper, we investigate whether hallucinations can improve LLMs on molecule property prediction, a key task in early-stage drug discovery. We prompt LLMs to generate natural language descriptions from molecular SMILES strings and incorporate these often hallucinated descriptions into downstream classification tasks. Evaluating seven instruction-tuned LLMs across five datasets, we find that hallucinations significantly improve predictive accuracy for some models. Notably, Falcon3-Mamba-7B outperforms all baselines when hallucinated text is included, while hallucinations generated by GPT-4o consistently yield the greatest gains between models. We further identify and categorize over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods
