Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations
Gihan Panapitiya, Michael Girard, Aaron Hollas, Vijay Murugesan, Wei, Wang, Emily Saldanha

TL;DR
This study develops deep learning models using various molecular representations to predict aqueous solubility of organic molecules, achieving high accuracy and analyzing factors influencing model performance across a large dataset.
Contribution
It introduces a comprehensive comparison of molecular representations and neural network architectures for solubility prediction, highlighting the effectiveness of molecular descriptors and GNNs.
Findings
Molecular descriptors yield the best prediction performance.
Graph neural networks perform comparably well.
Model accuracy improves with larger datasets.
Abstract
Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goal of this study is to develop a general model capable of predicting the solubility of a broad range of organic molecules. Using the largest currently available solubility dataset, we implement deep learning-based models to predict solubility from molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system (SMILES) strings, molecular graphs, and three-dimensional (3D) atomic coordinates using four different neural network architectures - fully connected neural networks (FCNNs),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Crystallization and Solubility Studies
MethodsShifted Softplus · Schrödinger Network
