TL;DR
This paper evaluates the limitations of current deep learning reaction prediction models in out-of-distribution scenarios, highlighting their challenges in generalizing to novel chemistry and proposing more rigorous testing methods.
Contribution
It introduces a series of challenging out-of-distribution evaluations for reaction prediction models, revealing their limitations and guiding future development for reaction discovery.
Findings
Models perform well on random datasets but poorly on new patents and authors.
Time split evaluations show decreased accuracy on reactions published after training.
Extrapolation across reaction classes exposes significant generalization gaps.
Abstract
Deep learning models for anticipating the products of organic reactions have found many use cases, including validating retrosynthetic pathways and constraining synthesis-based molecular design tools. Despite compelling performance on popular benchmark tasks, strange and erroneous predictions sometimes ensue when using these models in practice. The core issue is that common benchmarks test models in an in-distribution setting, whereas many real-world uses for these models are in out-of-distribution settings and require a greater degree of extrapolation. To better understand how current reaction predictors work in out-of-distribution domains, we report a series of more challenging evaluations of a prototypical SMILES-based deep learning model. First, we illustrate how performance on randomly sampled datasets is overly optimistic compared to performance when generalizing to new patents or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
