TL;DR
SMDD-Bench is a new challenging benchmark for evaluating large language models on real-world small molecule drug design tasks, highlighting current limitations and encouraging progress in autonomous computational drug discovery.
Contribution
The paper introduces SMDD-Bench, a comprehensive, multi-turn benchmark with 502 tasks across diverse chemistries and targets for assessing LLMs in drug design.
Findings
Most LLMs solve less than 50% of tasks
Current models lack sufficient reasoning and planning skills
SMDD-Bench enables standardized evaluation of LLMs in drug discovery
Abstract
LLM agents have incredible potential for scientific discovery applications. However, the performance of LLM agents on real-world, small molecule drug design (SMDD) tasks across diverse chemistries and targets is unclear. Current evaluation methods are either ad hoc, too simple for real-world discovery, limited in scale, or restricted to single-turn question answering. In effort to standardize the evaluation of LLM agents on small molecule design, we introduce SMDD-Bench, a challenging, multi-turn, long-horizon agentic benchmark consisting of 502 guaranteed-solvable task instances spanning 5 task types: 2D Pharmacophore Identification, Interaction Point Discovery, Scaffold Hopping, Lead Optimization, and Fragment Assembly. SMDD-Bench tasks span a wide region of chemical space and involve 102 unique protein targets. Completely solving the benchmark would require having strong chemical and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
