# Deductive machine learning models for product identification

**Authors:** Tianfan Jin, Qiyuan Zhao, Andrew B. Schofield, Brett M. Savoie

PMC · DOI: 10.1039/d3sc04909d · Chemical Science · 2024-07-01

## TL;DR

This paper introduces machine learning models that can perform deductive reasoning to identify chemical products from mixed spectral data, improving chemical analysis and automation.

## Contribution

The novel contribution is a general strategy for combining inductive models into a deductive network for chemical reasoning tasks.

## Key findings

- Deductive ML models can distinguish intended and unintended reaction outcomes from spectral data.
- The models generalize well to tasks like structural inference and identifying impurities.
- A new dataset of 1,124,043 simulated spectra was created to train these models.

## Abstract

Deductive solution strategies are required in prediction scenarios that are under determined, when contradictory information is available, or more generally wherever one-to-many non-functional mappings occur. In contrast, most contemporary machine learning (ML) in the chemical sciences is inductive learning from example, with a fixed set of features. Chemical workflows are replete with situations requiring deduction, including many aspects of lab automation and spectral interpretation. Here, a general strategy is described for designing and training machine learning models capable of deduction that consists of combining individual inductive models into a larger deductive network. The training and testing of these models is demonstrated on the task of deducing reaction products from a mixture of spectral sources. The resulting models can distinguish between intended and unintended reaction outcomes and identify starting material based on a mixture of spectral sources. The models also perform well on tasks that they were not directly trained on, like performing structural inference using real rather than simulated spectral inputs, predicting minor products from named organic chemistry reactions, identifying reagents and isomers as plausible impurities, and handling missing or conflicting information. A new dataset of 1 124 043 simulated spectra that were generated to train these models is also distributed with this work. These findings demonstrate that deductive bottlenecks for chemical problems are not fundamentally insuperable for ML models.

Machine learning models are developed that emulate the H deductive chemical reasoning processes associated with product identification from analytical spectra.

## Full-text entities

- **Chemicals:** ibuprofen (MESH:D007052), C (MESH:D002244), Si (MESH:D012825), S (MESH:D013455), anisidine (MESH:C559528), H (MESH:D006859), F (MESH:D005461), I (MESH:D007455), N (MESH:D009584), 1H (-), B (MESH:D001895), Cl (MESH:D002713), R (MESH:D001120), Br (MESH:D001966), Se (MESH:D012643), P (MESH:D010758), O (MESH:D010100)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11290435/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11290435/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/PMC11290435/full.md

---
Source: https://tomesphere.com/paper/PMC11290435