LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models

Jinho Chang; Jong Chul Ye

arXiv:2405.17829·cs.LG·June 5, 2025·3 cites

LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models

Jinho Chang, Jong Chul Ye

PDF

Open Access 1 Repo 1 Models

TL;DR

LDMol is a novel latent diffusion model that effectively generates molecules from text, outperforming autoregressive models and enabling downstream tasks like retrieval and editing through a structurally informative latent space.

Contribution

The paper introduces LDMol, a latent diffusion model with a contrastive learning-based latent space that surpasses autoregressive models in text-to-molecule generation.

Findings

01

LDMol outperforms autoregressive baselines on text-to-molecule benchmarks.

02

LDMol enables effective molecule-to-text retrieval.

03

LDMol supports text-guided molecule editing.

Abstract

With the emergence of diffusion models as a frontline generative model, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, here we present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation. By recognizing that the suitable latent space design is the key to the diffusion model performance, we employ a contrastive learning strategy to extract novel feature space from text data that embeds the unique characteristics of the molecule structure. Experiments show that LDMol outperforms the existing autoregressive baselines on the text-to-molecule generation benchmark, being one of the first diffusion models that outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinhojsk515/ldmol
pytorchOfficial

Models

🤗
jinhojsk515/LDMol
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods

MethodsAttention Is All You Need · Attentive Walk-Aggregating Graph Neural Network · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout