# Generating diversity and securing completeness in algorithmic retrosynthesis

**Authors:** Florian Mrugalla, Christopher Franz, Yannic Alber, Georg Mogk, Martín Villalba, Thomas Mrziglod, Kevin Schewior

PMC · DOI: 10.1186/s13321-025-00981-x · Journal of Cheminformatics · 2025-05-13

## TL;DR

This paper introduces a new algorithm for chemical synthesis planning that improves diversity and efficiency compared to existing methods.

## Contribution

The paper introduces a novel chemical diversity score and adapts DFPN to improve diversity and completeness in retrosynthesis planning.

## Key findings

- The proposed algorithm outperforms Monte-Carlo Tree Search in diversity and time efficiency.
- DFPN is shown to be complete when reinforced with a threshold-controlling routine.
- A cleaner example of DFPN's incompleteness is provided.

## Abstract

Chemical synthesis planning has considerably benefited from advances in the field of machine learning. Neural networks can reliably and accurately predict reactions leading to a given, possibly complex, molecule. In this work we focus on algorithms for assembling such predictions to a full synthesis plan that, starting from simple building blocks, produces a given target molecule, a procedure known as retrosynthesis. Objective functions for this task are hard to define and context-specific. In order to generate a diverse set of synthesis plans for chemists to select from, we capture the concept of diversity in a novel chemical diversity score (CDS). Our experiments show that our algorithm outperforms the algorithm predominantly employed in this domain, Monte-Carlo Tree Search, with respect to diversity in terms of our score as well as time efficiency.

We adapt Depth-First Proof-Number Search (DFPN) (Please refer to https://github.com/Bayer-Group/bayer-retrosynthesis-search for the accompanying source code.) and its variants, which have been applied to retrosynthesis before, to produce a set of solutions, with an explicit focus on diversity. We also make progress on understanding DFPN in terms of completeness, i.e., the ability to find a solution whenever there exists one. DFPN is known to be incomplete, for which we provide a much cleaner example, but we also show that it is complete when reinforced with a threshold-controlling routine from the literature.

## Full-text entities

- **Diseases:** PNS (MESH:D007674), DFPN (MESH:D007222)
- **Chemicals:** hydrogen (MESH:D006859), carbon (MESH:D002244), acids (MESH:D000143), R. (MESH:D001120), L (MESH:D007930), DFPN (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** G rather than L

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12076909/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12076909/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12076909/full.md

---
Source: https://tomesphere.com/paper/PMC12076909