RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation

Yiming Wang; Yuxuan Song; Yiqun Wang; Minkai Xu; Rui Wang; Hao Zhou; Wei-Ying Ma

arXiv:2311.14077·cs.LG·July 22, 2025·1 cites

RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation

Yiming Wang, Yuxuan Song, Yiqun Wang, Minkai Xu, Rui Wang, Hao Zhou, Wei-Ying Ma

PDF

Open Access 3 Reviews

TL;DR

RetroDiff introduces a multi-stage diffusion approach for retrosynthesis, effectively modeling the process as a graph-to-graph generative task that outperforms existing methods in accuracy and validity.

Contribution

The paper presents RetroDiff, a novel diffusion-based framework that decomposes retrosynthesis into multi-stage graph generation, integrating chemical reaction templates with diffusion models.

Findings

01

RetroDiff surpasses semi-template methods in accuracy.

02

RetroDiff outperforms template-based methods in large-scale scenarios.

03

RetroDiff achieves higher molecular validity compared to template-free methods.

Abstract

Retrosynthesis poses a key challenge in biopharmaceuticals, aiding chemists in finding appropriate reactant molecules for given product molecules. With reactants and products represented as 2D graphs, retrosynthesis constitutes a conditional graph-to-graph (G2G) generative task. Inspired by advancements in discrete diffusion models for graph generation, we aim to design a diffusion-based method to address this problem. However, integrating a diffusion-based G2G framework while retaining essential chemical reaction template information presents a notable challenge. Our key innovation involves a multi-stage diffusion process. We decompose the retrosynthesis procedure to first sample external groups from the dummy distribution given products, then generate external bonds to connect products and generated groups. Interestingly, this generation process mirrors the reverse of the widely…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 5

Strengths

- diffusion has not been applied much to retrosynthesis - results are decent but not impressive

Weaknesses

- several baselines were missing - the design of the model seems to be very complex - inference times are not reported -citation of graph2edit is missing, which is semi-template, and very strong https://www.nature.com/articles/s41467-023-38851-5

Reviewer 02Rating 3· reject, not good enoughConfidence 4

Strengths

(S1): The idea to apply discrete diffusion to reaction prediction is, to my knowledge, novel (or at least doesn't appear in any established work). It also seems potentially promising given recent advances in diffusion models overall. (S2): The paper includes some useful qualitative visualizations (Figures 5-6), which help get intuition about how RetroDiff works.

Weaknesses

(W1): The empirical results are not too impressive. In the paper, the results appear more promising than they actually are, as several baselines are missing, and the presentation is also somewhat biased. - (a) Strong baselines are missing: for example RetroKNN [1] on the template-based side, and RootAligned SMILES [2] on the template-free side. Both of these models outperform RetroDiff by a large margin (a few %) across top-k values. Note that recent work [3] corrected some of the previously r

Reviewer 03Rating 3· reject, not good enoughConfidence 3

Strengths

- The main idea of predicting in two stages is good, and connects seemingly to how reactions work. The first stage can be seen as gathering the required extra atoms, and second stage is motivated by finding the reaction center. This part could have been strenghtened with more chemical analysis and motivation. - The performance is competetive, but perhaps a bit inconclusive.

Weaknesses

- The paper is poorly written, and is not up to par of ICLR publications. The math suffers from adhoc presentation wrt densities, distributions and mappings. I'm not convinced the method is mathematically correctly presented. - The method is incremental: the diffusion models are taken off-the-shelf, and also the overall approach of group/bond generation is already seemingly known. It’s difficult to see why this method works well, and I suspect it’s mostly transformer tuning. This is not elaborat

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Gene expression and cancer classification · Asymmetric Hydrogenation and Catalysis

MethodsDiffusion