RetroOOD: Understanding Out-of-Distribution Generalization in Retrosynthesis Prediction
Yemin Yu, Luotian Yuan, Ying Wei, Hanyu Gao, Xinhai Ye, Zhihua Wang,, Fei Wu

TL;DR
This paper investigates the challenges of out-of-distribution generalization in retrosynthesis prediction models, introduces new benchmarks for distribution shifts, and proposes techniques to enhance model robustness in real-world applications.
Contribution
It systematically evaluates existing models under distribution shifts, constructs new benchmarks, and proposes model-agnostic methods to improve OOD performance.
Findings
Existing models perform poorly under distribution shifts.
New benchmarks reveal limitations of current evaluation methods.
Proposed techniques improve OOD performance by an average of 4.6%.
Abstract
Machine learning-assisted retrosynthesis prediction models have been gaining widespread adoption, though their performances oftentimes degrade significantly when deployed in real-world applications embracing out-of-distribution (OOD) molecules or reactions. Despite steady progress on standard benchmarks, our understanding of existing retrosynthesis prediction models under the premise of distribution shifts remains stagnant. To this end, we first formally sort out two types of distribution shifts in retrosynthesis prediction and construct two groups of benchmark datasets. Next, through comprehensive experiments, we systematically compare state-of-the-art retrosynthesis prediction models on the two groups of benchmarks, revealing the limitations of previous in-distribution evaluation and re-examining the advantages of each model. More remarkably, we are motivated by the above empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Electrocatalysts for Energy Conversion
