TL;DR
This paper introduces MELD, a novel diffusion method that improves molecular generation by avoiding state-clashing through element-wise noise scheduling, significantly boosting validity and property alignment.
Contribution
The paper proposes MELD, a new diffusion approach with element-wise noise scheduling to address state-clashing in molecular diffusion models, enhancing generation quality.
Findings
Increased chemical validity from 15% to 93% on ZINC250K.
Achieved state-of-the-art property alignment in conditional generation.
Demonstrated the effectiveness of element-wise noise scheduling over element-agnostic methods.
Abstract
Masked diffusion models (MDMs) have achieved notable progress in modeling discrete data, while their potential in molecular generation remains underexplored. In this work, we explore their potential and introduce the surprising result that naively applying standards MDMs severely degrades the performance. We identify the critical cause of this issue as a state-clashing problem-where the forward diffusion of distinct molecules collapse into a common state, resulting in a mixture of reconstruction targets that cannot be learned using typical reverse diffusion process with unimodal predictions. To mitigate this, we propose Masked Element-wise Learnable Diffusion (MELD) that orchestrates per-element corruption trajectories to avoid collision between distinct molecular graphs. This is achieved through a parameterized noise scheduling network that assigns distinct corruption rates to…
Peer Reviews
Decision·ICLR 2026 Poster
1. This paper explores the performance of the currently popular MDM in the field of molecular generation, which is a research topic worth pursuing. 2. The authors make improvements based on the MDM by introducing element-wise embedding to adapt it to molecular generation tasks. 3. The authors also validate the effectiveness of the method on large-scale datasets such as Guacamol.
1. Some overclaims in the paper need clarification, such as the statement that in previous work the transition probabilities between elements in the forward process are all uniformly distributed. 2. When proposing the state-clashing problem, the authors lack demonstrations on large-scale datasets. This makes it difficult to convince readers whether such a problem truly exists. 3. The cases in Figure 2 are not easy to understand and require clearer explanation. 4. Some parts of the method that ar
1. Originality and Significance. The paper makes a significant and original contribution by identifying the "state-clashing problem" as an obstacle to applying standard MDMs to structured data like molecular graphs. The core idea of learning an element-wise forward process to orchestrate distinct corruption trajectories is an elegant and insightful solution. 2. Quality. The technical quality of the work is high. The hypothesis about state-clashing is well-motivated and convincingly demonstrated
1. The paper's central claim of superiority is undermined by an incomplete set of baseline comparisons. While MELD is shown to be effective against standard MDMs and some diffusion models, it omits a direct comparison to some relevant works. Methods presented in "Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation" and "Learning Joint 2-D and 3-D Graph Diffusion Models for Complete Molecule Generation" have demonstrated exceptional performance on ZINC250K benc
1.An intuitive explanation and formal analysis of the "state-clashing" phenomenon are given, pointing out that fixed, element-independent forward occlusion makes it easy for different graphs to fall into intermediate states with poor distinguishability, resulting in a highly multimodal posterior and a model approximating a "unimodal, decompositional" distribution, which in turn produces solutions with high entropy and distribution shift. Formulas (3) and (4) are relatively clear with textual exp
1.The element-level kernel renders the forward process non-equivariant, meaning the intermediate state distribution is affected by vertex permutations. For molecular graphs, this contradicts the fundamental principle that isomorphism should not alter the generative distribution. Current methods merely introduce a learnable embedding H for each graph element and "randomly permutate columns" to "distinguish graph states with the same number of nodes/edges," but this does not restore the guarantee
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
