Linking the Neural Machine Translation and the Prediction of Organic Chemistry Reactions
Juno Nam, Jurae Kim

TL;DR
This paper applies neural machine translation models to predict organic chemistry reaction products, enabling automated predictions without manual rule encoding by learning from reaction datasets.
Contribution
It introduces a sequence-to-sequence neural model for reaction prediction that learns directly from reaction data, bypassing manual rule-based methods.
Findings
Model successfully predicts reaction products from SMILES strings.
Training on patent and textbook reactions improves prediction accuracy.
The approach reduces the need for manual rule encoding in reaction prediction.
Abstract
Finding the main product of a chemical reaction is one of the important problems of organic chemistry. This paper describes a method of applying a neural machine translation model to the prediction of organic chemical reactions. In order to translate 'reactants and reagents' to 'products', a gated recurrent unit based sequence-to-sequence model and a parser to generate input tokens for model from reaction SMILES strings were built. Training sets are composed of reactions from the patent databases, and reactions manually generated applying the elementary reactions in an organic chemistry textbook of Wade. The trained models were tested by examples and problems in the textbook. The prediction process does not need manual encoding of rules (e.g., SMARTS transformations) to predict products, hence it only needs sufficient training reaction sets to learn new types of reactions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Materials Science
