Procedural Synthesis of Synthesizable Molecules
Michael Sun, Alston Lo, Minghao Guo, Jie Chen, Connor Coley, Wojciech, Matusik

TL;DR
This paper introduces a novel framework for designing synthesizable molecules and their analogs by applying program synthesis techniques, enabling explicit control over synthesis complexity and resource allocation.
Contribution
It proposes a bilevel synthesis framework that decouples molecular structure from semantics, utilizing syntax-guided methods and evolutionary algorithms for improved molecule design.
Findings
Demonstrates improved analog generation performance
Provides explicit control over synthesis resources
Biases design towards simpler, more feasible solutions
Abstract
Designing synthetically accessible molecules and recommending analogs to unsynthesizable molecules are important problems for accelerating molecular discovery. We reconceptualize both problems using ideas from program synthesis. Drawing inspiration from syntax-guided synthesis approaches, we decouple the syntactic skeleton from the semantics of a synthetic tree to create a bilevel framework for reasoning about the combinatorial space of synthesis pathways. Given a molecule we aim to generate analogs for, we iteratively refine its skeletal characteristics via Markov Chain Monte Carlo simulations over the space of syntactic skeletons. Given a black-box oracle to optimize, we formulate a joint design space over syntactic templates and molecular descriptors and introduce evolutionary algorithms that optimize both syntactic and semantic dimensions synergistically. Our key insight is that…
Peer Reviews
Decision·ICLR 2025 Poster
1. The problem statement is well-defined, and the methods for synthesizing analogs and generating molecules are clearly explained, including the program's semantics. 2. The method achieves state-of-the-art performance in molecule generation on benchmark datasets and demonstrates significantly greater efficiency than the SynNet method when tested with various oracles, such as GSK, JNK, and DRD2. 3. Experimental analysis was conducted using various evaluation metrics, including bioactivity predict
1. This paper failed to mention the source code / anonymous repository and also in Appendix E . 6 ATTENTION VISUALIZATION figure number is missing. 2. Results are compared against the 2022 paper; The authors have not compared the results against any recent publications. 3. This paper doesn't address the computational cost or effectiveness of the algorithms. How long does it take to train the inner loop given ~136k synthetic trees, molecule generation or analog creation?
1. Frames molecular design and synthesizable analog recommendation as conditional program synthesis tasks, offering a fresh perspective in this field. 2. Demonstrates robust performance across key metrics, underscoring the effectiveness of the proposed methods. 3. Provides thorough experiments that validate the approach and its contributions to molecular design and synthesis.
1. The current approach uses a limited number of templates, and it is unclear how this framework could be expanded to include a broader range of templates, which could limit its flexibility. 2. Although the authors claim efficiency, the paper lacks direct comparisons to demonstrate this advantage against other methods. 3. The comparison between tasks in Section 3.1 could be enhanced with mathematical notation alongside chemistry examples. While the method draws on program synthesis concepts, the
- The approach is interesting from the soft-computing point of view. The authors leverage the four different approaches on the right places. The separation between the structure and the content in the synthesis approach is interesting in particular. The use of MCMC and GA is more standard but is well suited for the new candidate tree generation and the search of the best structure and content of candidate molecule.
- I think the possible weakness is the dependency on the tree and the grammar components. On one hand having a very large amount of templates will increase the computational complexity of the model (it is not clear how for instance the MCMC algorithm would handle this) and on the other hand a more efficient smaller set will not allow to generate all desired solutions
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChemical Synthesis and Analysis · Computational Drug Discovery Methods · Synthetic Organic Chemistry Methods
