Text-Guided Molecule Generation with Diffusion Language Model
Haisong Gong, Qiang Liu, Shu Wu, Liang Wang

TL;DR
This paper introduces TGM-DLM, a diffusion-based model for text-guided molecule generation that outperforms autoregressive models like MolT5-Base in generating valid, property-specific molecules.
Contribution
The paper presents a novel diffusion model approach for molecule generation from text, overcoming limitations of autoregressive methods and improving validity and specificity.
Findings
TGM-DLM outperforms MolT5-Base in molecule generation quality.
The model effectively corrects invalid SMILES strings.
It demonstrates strong potential for drug discovery applications.
Abstract
Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string collectively and iteratively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. We demonstrate that TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for additional data resources. Our findings underscore the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChemical Synthesis and Analysis
MethodsDiffusion
