Linking the Neural Machine Translation and the Prediction of Organic   Chemistry Reactions

Juno Nam; Jurae Kim

arXiv:1612.09529·cs.LG·January 2, 2017·122 cites

Linking the Neural Machine Translation and the Prediction of Organic Chemistry Reactions

Juno Nam, Jurae Kim

PDF

Open Access

TL;DR

This paper applies neural machine translation models to predict organic chemistry reaction products, enabling automated predictions without manual rule encoding by learning from reaction datasets.

Contribution

It introduces a sequence-to-sequence neural model for reaction prediction that learns directly from reaction data, bypassing manual rule-based methods.

Findings

01

Model successfully predicts reaction products from SMILES strings.

02

Training on patent and textbook reactions improves prediction accuracy.

03

The approach reduces the need for manual rule encoding in reaction prediction.

Abstract

Finding the main product of a chemical reaction is one of the important problems of organic chemistry. This paper describes a method of applying a neural machine translation model to the prediction of organic chemical reactions. In order to translate 'reactants and reagents' to 'products', a gated recurrent unit based sequence-to-sequence model and a parser to generate input tokens for model from reaction SMILES strings were built. Training sets are composed of reactions from the patent databases, and reactions manually generated applying the elementary reactions in an organic chemistry textbook of Wade. The trained models were tested by examples and problems in the textbook. The prediction process does not need manual encoding of rules (e.g., SMARTS transformations) to predict products, hence it only needs sufficient training reaction sets to learn new types of reactions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Materials Science