Can Hallucinations Help? Boosting LLMs for Drug Discovery

Shuzhou Yuan; Zhan Qu; Ashish Yashwanth Kangen; Michael F\"arber

arXiv:2501.13824·cs.CL·August 25, 2025

Can Hallucinations Help? Boosting LLMs for Drug Discovery

Shuzhou Yuan, Zhan Qu, Ashish Yashwanth Kangen, Michael F\"arber

PDF

Open Access

TL;DR

This paper explores how hallucinations in large language models can be harnessed to improve molecule property prediction in drug discovery, challenging the view that hallucinations are solely problematic.

Contribution

It demonstrates that hallucinations can enhance predictive accuracy in drug discovery tasks and identifies the types of hallucinations that are most beneficial.

Findings

01

Hallucinations significantly improve model accuracy in some cases.

02

Larger models benefit more from hallucinations.

03

Structural misdescriptions are the most impactful hallucination type.

Abstract

Hallucinations in large language models (LLMs), plausible but factually inaccurate text, are often viewed as undesirable. However, recent work suggests that such outputs may hold creative potential. In this paper, we investigate whether hallucinations can improve LLMs on molecule property prediction, a key task in early-stage drug discovery. We prompt LLMs to generate natural language descriptions from molecular SMILES strings and incorporate these often hallucinated descriptions into downstream classification tasks. Evaluating seven instruction-tuned LLMs across five datasets, we find that hallucinations significantly improve predictive accuracy for some models. Notably, Falcon3-Mamba-7B outperforms all baselines when hallucinated text is included, while hallucinations generated by GPT-4o consistently yield the greatest gains between models. We further identify and categorize over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods