# Application of Multivariate Adaptive Regression Splines (MARSplines) for   Predicting Hansen Solubility Parameters Based on 1D and 2D Molecular   Descriptors Computed from SMILES String

**Authors:** Maciej Przyby{\l}ek, Tomasz Jeli\'nski, Piotr Cysewski

arXiv: 1901.03408 · 2019-01-14

## TL;DR

This paper introduces a novel method combining MARSplines with molecular descriptors from SMILES strings to predict Hansen solubility parameters, enabling solvent and solute characterization without molecular geometry optimization.

## Contribution

The study develops a simple, effective MARSplines-based QSPR model using 1D and 2D descriptors from SMILES for Hansen parameters prediction, bypassing complex geometry optimization.

## Key findings

- Models show high correlation with experimental HSP data.
- Predictions enable solvent and solute characterization without geometry optimization.
- Method performs well in solubility classification tasks.

## Abstract

A new method of Hansen solubility parameters (HSPs) prediction was developed by combining the multivariate adaptive regression splines (MARSplines) methodology with a simple multivariable regression involving 1D and 2D PaDEL molecular descriptors. In order to adopt the MARSplines approach to QSPR/QSAR problems, several optimization procedures were proposed and tested. The effectiveness of the obtained models was checked via standard QSPR/QSAR internal validation procedures provided by the QSARINS software and by predicting the solubility classification of polymers and drug-like solid solutes in collections of solvents. By utilizing information derived only from SMILES strings, the obtained models allow for computing all of the three Hansen solubility parameters including dispersion, polarization, and hydrogen bonding. Although several descriptors are required for proper parameters estimation, the proposed procedure is simple and straightforward and does not require a molecular geometry optimization. The obtained HSP values are highly correlated with experimental data, and their application for solving solubility problems leads to essentially the same quality as for the original parameters. Based on provided models, it is possible to characterize any solvent and liquid solute for which HSP data are unavailable.

---
Source: https://tomesphere.com/paper/1901.03408