Transferable enantioselectivity models from sparse data
Simone Gallarati, Erin M. Bucci, Abigail G. Doyle, Matthew S. Sigman

TL;DR
This paper introduces a machine learning approach to predict the enantioselectivity of chemical reactions using limited data, helping optimize catalysts for new reactions.
Contribution
A novel descriptor generation strategy that enables modeling of diverse ligand and substrate types with sparse data.
Findings
Models trained on sparse data can optimize poorly performing reactions in substrate scope.
The approach is applicable to unseen ligands and reaction partners.
The method captures mechanistic complexity through transition state and intermediate features.
Abstract
Identifying a catalyst class to optimize the enantioselectivity of a new reaction, either involving a different combination of known substrate types or an entirely unfamiliar class of compounds, is a formidable challenge. Statistical models trained on a reported set of reactions can help predict out-of-sample transformations1–5 but often face two challenges: (1) only sparse data that offer limited information on catalyst–substrate interactions are available; and (2) simple stereoelectronic parameters may fail to describe mechanistically complex transformations6,7. Here we report a descriptor generation strategy that accounts for changes in the enantiodetermining step with catalyst or substrate identity, allowing us to model reactions involving distinct ligand and substrate types. As validating case studies, we collected data on enantioselective nickel-catalysed C(sp3) couplings8 and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Asymmetric Hydrogenation and Catalysis · Computational Drug Discovery Methods
