ChatMol: A Versatile Molecule Designer Based on the Numerically Enhanced Large Language Model
Chuanliu Fan, Ziqiang Cao, Zicheng Ma, Nan Yu, Yimin Peng, Jun Zhang,, Yiqin Gao, Guohong Fu

TL;DR
ChatMol leverages large language models with numerical encoding enhancements to generate molecules with specific properties and constraints, outperforming traditional methods in drug discovery tasks.
Contribution
The paper introduces ChatMol, a novel LLM-based molecule design framework that effectively incorporates numerical and substructure constraints, surpassing existing methods.
Findings
Outperforms state-of-the-art baselines in constrained molecule generation
Achieves a KD value of 0.25 in binding affinity maximization for ESR1
Increases Pearson correlation coefficient by up to 0.49 with numerical enhancement
Abstract
Goal-oriented de novo molecule design, namely generating molecules with specific property or substructure constraints, is a crucial yet challenging task in drug discovery. Existing methods, such as Bayesian optimization and reinforcement learning, often require training multiple property predictors and struggle to incorporate substructure constraints. Inspired by the success of Large Language Models (LLMs) in text generation, we propose ChatMol, a novel approach that leverages LLMs for molecule design across diverse constraint settings. Initially, we crafted a molecule representation compatible with LLMs and validated its efficacy across multiple online LLMs. Afterwards, we developed specific prompts geared towards diverse constrained molecule generation tasks to further fine-tune current LLMs while integrating feedback learning derived from property prediction. Finally, to address the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics
