How Well Do LLMs Understand Drug Mechanisms? A Knowledge + Reasoning Evaluation Dataset
Sunil Mohan, Theofanis Karaletsos

TL;DR
This paper introduces a dataset to evaluate large language models' understanding of drug mechanisms, testing their factual knowledge and reasoning abilities in novel, counterfactual scenarios relevant to drug development and personalized medicine.
Contribution
The authors present a new dataset for assessing LLMs' knowledge and reasoning about drug mechanisms, and benchmark several models, revealing insights into their strengths and limitations in scientific reasoning.
Findings
o4-mini outperforms several OpenAI models
Qwen3-4B-thinking matches or exceeds o4-mini in some tasks
Reasoning with counterfactuals affecting internal links is more challenging
Abstract
Two scientific fields showing increasing interest in pre-trained large language models (LLMs) are drug development / repurposing, and personalized medicine. For both, LLMs have to demonstrate factual knowledge as well as a deep understanding of drug mechanisms, so they can recall and reason about relevant knowledge in novel situations. Drug mechanisms of action are described as a series of interactions between biomedical entities, which interlink into one or more chains directed from the drug to the targeted disease. Composing the effects of the interactions in a candidate chain leads to an inference about whether the drug might be useful or not for that disease. We introduce a dataset that evaluates LLMs on both factual knowledge of known mechanisms, and their ability to reason about them under novel situations, presented as counterfactuals that the models are unlikely to have seen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Biomedical Text Mining and Ontologies
