Assay2Mol: large language model-based drug design using BioAssay context

Yifan Deng; Spencer S. Ericksen; Anthony Gitter

arXiv:2507.12574·cs.LG·November 18, 2025

Assay2Mol: large language model-based drug design using BioAssay context

Yifan Deng, Spencer S. Ericksen, Anthony Gitter

PDF

1 Repo

TL;DR

Assay2Mol leverages large language models to utilize unstructured biochemical assay data for early-stage drug discovery, generating candidate molecules that outperform existing methods in relevance and synthesizability.

Contribution

This work introduces a novel LLM-based workflow that uses assay context to generate candidate molecules, enhancing drug discovery processes with unstructured biochemical data.

Findings

01

Outperforms recent ML approaches in candidate molecule relevance

02

Promotes more synthesizable molecule generation

03

Effectively utilizes unstructured assay data for drug design

Abstract

Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. In biochemistry, molecule screening assays evaluate candidate molecules' functional responses against disease targets. Unstructured text that describes the biological mechanisms through which these targets operate, experimental screening protocols, and other attributes of assays offer rich information for drug discovery campaigns but has been untapped because of that unstructured format. We present Assay2Mol, a large language model-based workflow that can capitalize on the vast existing biochemical screening assays for early-stage drug discovery. Assay2Mol retrieves existing assay records involving targets similar to the new target and generates candidate molecules using in-context learning with the retrieved assay screening data. Assay2Mol outperforms recent machine learning approaches that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gitter-lab/Assay2Mol
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.