Synergy Effect between Convolutional Neural Networks and the Multiplicity of SMILES for Improvement of Molecular Prediction
Talia B. Kimber, Sebastian Engelke, Igor V. Tetko, Eric Bruno,, Guillaume Godin

TL;DR
This paper introduces the CNF model that combines convolutional neural networks with multiple SMILES representations, improving molecular property prediction especially on small datasets, and demonstrates the benefits of SMILES augmentation as regularization.
Contribution
The study presents the first methodology integrating SMILES multiplicity with CNNs for molecular prediction, outperforming traditional descriptors on small datasets.
Findings
CNF achieves accuracy comparable to traditional descriptors.
SMILES augmentation acts as regularizer, reducing overfitting.
SMILES multiplicity enhances model performance and generalization.
Abstract
In our study, we demonstrate the synergy effect between convolutional neural networks and the multiplicity of SMILES. The model we propose, the so-called Convolutional Neural Fingerprint (CNF) model, reaches the accuracy of traditional descriptors such as Dragon (Mauri et al. [22]), RDKit (Landrum [18]), CDK2 (Willighagen et al. [43]) and PyDescriptor (Masand and Rastija [20]). Moreover the CNF model generally performs better than highly fine-tuned traditional descriptors, especially on small data sets, which is of great interest for the chemical field where data sets are generally small due to experimental costs, the availability of molecules or accessibility to private databases. We evaluate the CNF model along with SMILES augmentation during both training and testing. To the best of our knowledge, this is the first time that such a methodology is presented. We show that using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Metabolomics and Mass Spectrometry Studies
