# A unified approach to inferring chemical compounds with the desired aqueous solubility

**Authors:** Muniba Batool, Naveed Ahmed Azam, Jianshen Zhu, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

PMC · DOI: 10.1186/s13321-025-00966-w · 2025-03-26

## TL;DR

This paper introduces a new method to predict and design chemical compounds with specific water solubility using simple mathematical models and optimization techniques.

## Contribution

The novel approach combines graph-theoretic descriptors, MLR, and MILP to infer compounds with desired solubility without complex models.

## Key findings

- The MLR model achieved high accuracy [0.7191, 0.9377] across 29 datasets using simple descriptors.
- MILP inferred optimal compounds with desired solubility and up to 50 non-hydrogen atoms in reasonable time.
- Simple graph-theoretic descriptors strongly correlate with aqueous solubility, offering a computationally efficient alternative.

## Abstract

Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR), and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6, 1166] seconds. These findings indicate a strong correlation between the simple graph-theoretic descriptors and the AS of compounds, potentially leading to a deeper understanding of their AS without relying on widely used complicated chemical descriptors and complex machine learning models that are computationally expensive, and therefore difficult to use for inference. An implementation of the proposed approach is available at https://github.com/ku-dml/mol-infer/tree/master/AqSol.

We provide a thorough survey of prediction models designed for AS. Based on simple graph-theoretic descriptors, MLR, and MILP, we successfully predicted and inferred optimal compounds with the desired AS across diverse datasets. These findings indicate a strong correlation between the simple graph-theoretic descriptors and the AS of compounds, potentially leading to a deeper understanding of their AS without relying on widely used complicated chemical descriptors and complex machine learning models that are computationally expensive, and therefore difficult to use for inference.

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11938699/full.md

---
Source: https://tomesphere.com/paper/PMC11938699