DirectMultiStep: Direct Route Generation for Multistep Retrosynthesis
Yu Shee, Anton Morgunov, Haote Li, Victor S. Batista

TL;DR
This paper introduces transformer-based models for direct multistep retrosynthesis route generation, significantly improving accuracy and efficiency over traditional iterative methods, and demonstrating strong generalization to unseen drugs.
Contribution
Proposes a novel transformer-based approach that directly generates multistep synthetic routes, outperforming state-of-the-art methods and enabling incorporation of additional constraints for better predictions.
Findings
Outperforms state-of-the-art on PaRoutes dataset with 1.9x and 3.1x Top-1 accuracy improvements.
Models generalize well to FDA-approved drugs not in training data.
Incorporating constraints improves accuracy and reduces model size.
Abstract
Traditional computer-aided synthesis planning (CASP) methods rely on iterative single-step predictions, leading to exponential search space growth that limits efficiency and scalability. We introduce a series of transformer-based models, that leverage a mixture of experts approach to directly generate multistep synthetic routes as a single string, conditionally predicting each transformation based on all preceding ones. Our DMS Explorer XL model, which requires only target compounds as input, outperforms state-of-the-art methods on the PaRoutes dataset with 1.9x and 3.1x improvements in Top-1 accuracy on the n and n test sets, respectively. Providing additional information, such as the desired number of steps and starting materials, enables both a reduction in model size and an increase in accuracy, highlighting the benefits of incorporating more constraints into the prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics · Genomics and Phylogenetic Studies
MethodsSparse Evolutionary Training
