Conditional Chemical Language Models are Versatile Tools in Drug Discovery
Lu Zhu, Emmanuel Noutahi

TL;DR
SAFE-T introduces a versatile conditional chemical language model that effectively integrates biological context for molecular design, scoring, and interpretability, significantly advancing drug discovery processes.
Contribution
The paper presents SAFE-T, a novel framework that conditions on biological context for molecule design and scoring without structural data or engineered scores.
Findings
Achieves competitive or superior performance on multiple benchmarks.
Supports goal-directed molecule generation aligned with biological objectives.
Provides interpretable insights into structure-activity relationships.
Abstract
Generative chemical language models (CLMs) have demonstrated strong capabilities in molecular design, yet their impact in drug discovery remains limited by the absence of reliable reward signals and the lack of interpretability in their outputs. We present SAFE-T, a generalist chemical modeling framework that conditions on biological context -- such as protein targets or mechanisms of action -- to prioritize and design molecules without relying on structural information or engineered scoring functions. SAFE-T models the conditional likelihood of fragment-based molecular sequences given a biological prompt, enabling principled scoring of molecules across tasks such as virtual screening, drug-target interaction prediction, and activity cliff detection. Moreover, it supports goal-directed generation by sampling from this learned distribution, aligning molecular design with biological…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Biomedical Text Mining and Ontologies
