A unified approach to inferring chemical compounds with the desired aqueous solubility
Muniba Batool, Naveed Ahmed Azam, Jianshen Zhu, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

TL;DR
This paper introduces a new method to predict and design chemical compounds with specific water solubility using simple mathematical models and optimization techniques.
Contribution
The novel approach combines graph-theoretic descriptors, MLR, and MILP to infer compounds with desired solubility without complex models.
Findings
The MLR model achieved high accuracy [0.7191, 0.9377] across 29 datasets using simple descriptors.
MILP inferred optimal compounds with desired solubility and up to 50 non-hydrogen atoms in reasonable time.
Simple graph-theoretic descriptors strongly correlate with aqueous solubility, offering a computationally efficient alternative.
Abstract
Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR), and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6, 1166] seconds. These findings indicate a strong…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics
