A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility
Muniba Batool, Naveed Ahmed Azam, Jianshen Zhu, Kazuya Haraguchi,, Liang Zhao, Tatsuya Akutsu

TL;DR
This paper introduces a unified, efficient method combining graph-theoretic descriptors, regression, and optimization to accurately predict and infer chemical compounds with specific aqueous solubility, simplifying existing complex models.
Contribution
The authors develop a novel approach that uses simple descriptors and linear models to predict and infer compounds with desired solubility, reducing computational complexity and increasing interpretability.
Findings
Achieved high prediction accuracy with simple models (0.7191 to 0.9377) across diverse datasets.
Successfully inferred optimal compounds with up to 50 non-hydrogen atoms within seconds to minutes.
Demonstrated strong correlation between simple descriptors and aqueous solubility, enabling deeper understanding.
Abstract
Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR) and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving the accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6, 1204] seconds. These findings indicate a strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnalytical Chemistry and Chromatography
MethodsLinear Regression
