SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties
Garrett B. Goh, Nathan O. Hodas, Charles Siegel, Abhinav Vishnu

TL;DR
SMILES2vec is an interpretable deep neural network that predicts various chemical properties directly from SMILES strings, outperforming traditional methods and providing insights aligned with chemical principles.
Contribution
This work introduces SMILES2vec, a general-purpose, interpretable deep RNN model that predicts multiple chemical properties directly from SMILES without manual feature engineering.
Findings
Outperforms traditional MLP models with engineered features
Achieves 88% accuracy in identifying chemically relevant parts for solubility
Demonstrates generalization across toxicity, activity, solubility, and solvation energy
Abstract
Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the need for additional explicit feature engineering. Using Bayesian optimization methods to tune the network architecture, we show that an optimized SMILES2vec model can serve as a general-purpose neural network for predicting distinct chemical properties including toxicity, activity, solubility and solvation energy, while also outperforming contemporary MLP neural networks that uses engineered features. Furthermore, we demonstrate proof-of-concept of interpretability by developing an explanation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Various Chemistry Research Topics
MethodsInterpretability
