SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?

Kevin Han; Renfei Zhang; Kathy Wei; Hamed Mahdavi; Niloofar Mireshghallah; Amir Farimani

arXiv:2605.21740·cs.AI·May 22, 2026

SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?

Kevin Han, Renfei Zhang, Kathy Wei, Hamed Mahdavi, Niloofar Mireshghallah, Amir Farimani

PDF

1 Repo

TL;DR

SMDD-Bench is a new challenging benchmark for evaluating large language models on real-world small molecule drug design tasks, highlighting current limitations and encouraging progress in autonomous computational drug discovery.

Contribution

The paper introduces SMDD-Bench, a comprehensive, multi-turn benchmark with 502 tasks across diverse chemistries and targets for assessing LLMs in drug design.

Findings

01

Most LLMs solve less than 50% of tasks

02

Current models lack sufficient reasoning and planning skills

03

SMDD-Bench enables standardized evaluation of LLMs in drug discovery

Abstract

LLM agents have incredible potential for scientific discovery applications. However, the performance of LLM agents on real-world, small molecule drug design (SMDD) tasks across diverse chemistries and targets is unclear. Current evaluation methods are either ad hoc, too simple for real-world discovery, limited in scale, or restricted to single-turn question answering. In effort to standardize the evaluation of LLM agents on small molecule design, we introduce SMDD-Bench, a challenging, multi-turn, long-horizon agentic benchmark consisting of 502 guaranteed-solvable task instances spanning 5 task types: 2D Pharmacophore Identification, Interaction Point Discovery, Scaffold Hopping, Lead Optimization, and Fragment Assembly. SMDD-Bench tasks span a wide region of chemical space and involve 102 unique protein targets. Completely solving the benchmark would require having strong chemical and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.