GEN: Highly Efficient SMILES Explorer Using Autodidactic Generative Examination Networks
Ruud van Deursen, Peter Ertl, Igor V. Tetko, Guillaume Godin

TL;DR
This paper introduces GEN, a new bidirectional RNN-based architecture for efficient and high-quality molecular SMILES generation, achieving high validity and novelty with rapid training and an online quality control mechanism.
Contribution
GEN is a robust, efficient architecture that learns target chemical space quickly and ensures high-quality molecule generation using an innovative online examination mechanism.
Findings
Achieves 95-98% valid SMILES generation.
Generates 85-90% novel molecules.
Maintains 95-99% property space conservation.
Abstract
Recurrent neural networks have been widely used to generate millions of de novo molecules in a known chemical space. These deep generative models are typically setup with LSTM or GRU units and trained with canonical SMILEs. In this study, we introduce a new robust architecture, Generative Examination Networks GEN, based on bidirectional RNNs with concatenated sub-models to learn and generate molecular SMILES with a trained target space. GENs autonomously learn the target space in a few epochs while being subjected to an independent online examination mechanism to measure the quality of the generated set. Here we have used online statistical quality control (SQC) on the percentage of valid molecules SMILES as an examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95-98%) can be generated using multiple parallel encoding layers in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods
MethodsSigmoid Activation · Tanh Activation · Gated Recurrent Unit · Long Short-Term Memory
