# Transferable enantioselectivity models from sparse data

**Authors:** Simone Gallarati, Erin M. Bucci, Abigail G. Doyle, Matthew S. Sigman

PMC · DOI: 10.1038/s41586-026-10239-7 · 2026-02-11

## TL;DR

This paper introduces a machine learning approach to predict the enantioselectivity of chemical reactions using limited data, helping optimize catalysts for new reactions.

## Contribution

A novel descriptor generation strategy that enables modeling of diverse ligand and substrate types with sparse data.

## Key findings

- Models trained on sparse data can optimize poorly performing reactions in substrate scope.
- The approach is applicable to unseen ligands and reaction partners.
- The method captures mechanistic complexity through transition state and intermediate features.

## Abstract

Identifying a catalyst class to optimize the enantioselectivity of a new reaction, either involving a different combination of known substrate types or an entirely unfamiliar class of compounds, is a formidable challenge. Statistical models trained on a reported set of reactions can help predict out-of-sample transformations1–5 but often face two challenges: (1) only sparse data that offer limited information on catalyst–substrate interactions are available; and (2) simple stereoelectronic parameters may fail to describe mechanistically complex transformations6,7. Here we report a descriptor generation strategy that accounts for changes in the enantiodetermining step with catalyst or substrate identity, allowing us to model reactions involving distinct ligand and substrate types. As validating case studies, we collected data on enantioselective nickel-catalysed C(sp3) couplings8 and trained statistical models with features extracted from the transition states and intermediates proposed to be involved in asymmetric induction. These models allow the optimization of poorly performing examples reported in a substrate scope and are applicable to unseen ligands and reaction partners. This approach offers the opportunity to streamline catalyst and reaction development, quantitatively transferring knowledge learned on sparse data to chemical spaces.

A machine-learning workflow has been developed to predict the enantioselectivity of asymmetric catalytic reactions using only sparse data for training.

## Full-text entities

- **Chemicals:** nickel (MESH:D009532), C (MESH:D002244)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12999503/full.md

---
Source: https://tomesphere.com/paper/PMC12999503