TL;DR
Assay2Mol leverages large language models to utilize unstructured biochemical assay data for early-stage drug discovery, generating candidate molecules that outperform existing methods in relevance and synthesizability.
Contribution
This work introduces a novel LLM-based workflow that uses assay context to generate candidate molecules, enhancing drug discovery processes with unstructured biochemical data.
Findings
Outperforms recent ML approaches in candidate molecule relevance
Promotes more synthesizable molecule generation
Effectively utilizes unstructured assay data for drug design
Abstract
Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. In biochemistry, molecule screening assays evaluate candidate molecules' functional responses against disease targets. Unstructured text that describes the biological mechanisms through which these targets operate, experimental screening protocols, and other attributes of assays offer rich information for drug discovery campaigns but has been untapped because of that unstructured format. We present Assay2Mol, a large language model-based workflow that can capitalize on the vast existing biochemical screening assays for early-stage drug discovery. Assay2Mol retrieves existing assay records involving targets similar to the new target and generates candidate molecules using in-context learning with the retrieved assay screening data. Assay2Mol outperforms recent machine learning approaches that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
