Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing
Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao,, Ling Liu, Jian Tang, Chaowei Xiao, Anima Anandkumar

TL;DR
MoleculeSTM is a multi-modal model that jointly learns chemical structures and textual descriptions, enabling advanced text-based retrieval and editing in drug discovery with state-of-the-art generalization.
Contribution
The paper introduces MoleculeSTM, a novel multi-modal model trained on a large dataset, enabling zero-shot text-based molecule retrieval and editing.
Findings
Achieves state-of-the-art zero-shot performance on biochemical benchmarks.
Demonstrates effective structure-text retrieval and molecule editing.
Supports open vocabulary and compositionality in molecular understanding.
Abstract
There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Chemical Synthesis and Analysis · Machine Learning in Materials Science
MethodsContrastive Learning
