Learning Flexible Forward Trajectories for Masked Molecular Diffusion

Hyunjin Seo; Taewon Kim; Sihyun Yu; SungSoo Ahn

arXiv:2505.16790·cs.LG·September 29, 2025

Learning Flexible Forward Trajectories for Masked Molecular Diffusion

Hyunjin Seo, Taewon Kim, Sihyun Yu, SungSoo Ahn

PDF

3 Reviews

TL;DR

This paper introduces MELD, a novel diffusion method that improves molecular generation by avoiding state-clashing through element-wise noise scheduling, significantly boosting validity and property alignment.

Contribution

The paper proposes MELD, a new diffusion approach with element-wise noise scheduling to address state-clashing in molecular diffusion models, enhancing generation quality.

Findings

01

Increased chemical validity from 15% to 93% on ZINC250K.

02

Achieved state-of-the-art property alignment in conditional generation.

03

Demonstrated the effectiveness of element-wise noise scheduling over element-agnostic methods.

Abstract

Masked diffusion models (MDMs) have achieved notable progress in modeling discrete data, while their potential in molecular generation remains underexplored. In this work, we explore their potential and introduce the surprising result that naively applying standards MDMs severely degrades the performance. We identify the critical cause of this issue as a state-clashing problem-where the forward diffusion of distinct molecules collapse into a common state, resulting in a mixture of reconstruction targets that cannot be learned using typical reverse diffusion process with unimodal predictions. To mitigate this, we propose Masked Element-wise Learnable Diffusion (MELD) that orchestrates per-element corruption trajectories to avoid collision between distinct molecular graphs. This is achieved through a parameterized noise scheduling network that assigns distinct corruption rates to…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. This paper explores the performance of the currently popular MDM in the field of molecular generation, which is a research topic worth pursuing. 2. The authors make improvements based on the MDM by introducing element-wise embedding to adapt it to molecular generation tasks. 3. The authors also validate the effectiveness of the method on large-scale datasets such as Guacamol.

Weaknesses

1. Some overclaims in the paper need clarification, such as the statement that in previous work the transition probabilities between elements in the forward process are all uniformly distributed. 2. When proposing the state-clashing problem, the authors lack demonstrations on large-scale datasets. This makes it difficult to convince readers whether such a problem truly exists. 3. The cases in Figure 2 are not easy to understand and require clearer explanation. 4. Some parts of the method that ar

Reviewer 02Rating 4Confidence 5

Strengths

1. Originality and Significance. The paper makes a significant and original contribution by identifying the "state-clashing problem" as an obstacle to applying standard MDMs to structured data like molecular graphs. The core idea of learning an element-wise forward process to orchestrate distinct corruption trajectories is an elegant and insightful solution. 2. Quality. The technical quality of the work is high. The hypothesis about state-clashing is well-motivated and convincingly demonstrated

Weaknesses

1. The paper's central claim of superiority is undermined by an incomplete set of baseline comparisons. While MELD is shown to be effective against standard MDMs and some diffusion models, it omits a direct comparison to some relevant works. Methods presented in "Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation" and "Learning Joint 2-D and 3-D Graph Diffusion Models for Complete Molecule Generation" have demonstrated exceptional performance on ZINC250K benc

Reviewer 03Rating 6Confidence 3

Strengths

1.An intuitive explanation and formal analysis of the "state-clashing" phenomenon are given, pointing out that fixed, element-independent forward occlusion makes it easy for different graphs to fall into intermediate states with poor distinguishability, resulting in a highly multimodal posterior and a model approximating a "unimodal, decompositional" distribution, which in turn produces solutions with high entropy and distribution shift. Formulas (3) and (4) are relatively clear with textual exp

Weaknesses

1.The element-level kernel renders the forward process non-equivariant, meaning the intermediate state distribution is affected by vertex permutations. For molecular graphs, this contradicts the fundamental principle that isomorphism should not alter the generative distribution. Current methods merely introduce a learnable embedding H for each graph element and "randomly permutate columns" to "distinguish graph states with the same number of nodes/edges," but this does not restore the guarantee

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion