# Developing Gaussian process regression, Lasso regression, and Nu-support vector regression models for predicting solubility of exemestane in supercritical CO2

**Authors:** Jawza A. Almutairi, Thamir Malik

PMC · DOI: 10.1038/s41598-025-31291-9 · Scientific Reports · 2025-12-08

## TL;DR

This paper compares machine learning models to predict the solubility of a cancer drug in supercritical CO2, finding that Gaussian Process Regression is the most accurate.

## Contribution

The novel contribution is demonstrating that Gaussian Process Regression outperforms Lasso and Nu-SVR for predicting drug solubility in supercritical CO2.

## Key findings

- Gaussian Process Regression achieved an R² score of 0.996, outperforming other models.
- Pressure was identified as the most significant factor influencing drug solubility in supercritical CO2.
- The GPR model showed high generalization and precision, validated through residual and feature importance analysis.

## Abstract

Precise estimation of pharmaceutical solubility in supercritical carbon dioxide (scCO2) is essential for optimizing pharmaceutical applications, including particle size reduction, the development of solid dispersions, and controlled-release formulations. In this research, we present a comparative analysis of three machine learning regression models—Lasso Regression, Gaussian Process Regression (GPR), and Nu-Support Vector Regression (Nu-SVR)—for predicting the solubility of exemestane (EXE), a poorly water-soluble anticancer drug, in scCO2 under varying temperature and pressure conditions. The dataset used in this work consists of 45 experimental measurements encompassing temperature (T in K), pressure (P in MPa), and solubility (in g/L) of EXE. The dataset was divided into training and testing data subsets to facilitate reliable model validation. Model performance was thoroughly evaluated using metrics such as the R², RMSE, MAE, and AARD%. Additionally, decision surfaces and observed-versus-predicted plots were generated to visually assess model accuracy. Among the applied models, Gaussian Process Regression demonstrated superior predictive capability with an R² score of 0.996, Maximum error of 3.27, significantly outperforming both Lasso and Nu-SVR models. These results indicate that GPR effectively captures the nonlinear relationship between process variables and drug solubility, offering high generalization and precision. Feature importance analysis confirmed that pressure has the most significant influence on solubility behavior, while temperature also contributes positively to solubility trends. Residual analysis further validated the consistency and reliability of the GPR-based model. This work contributes to the growing application of machine learning techniques in pharmaceutical process modeling, particularly in supercritical fluid-based drug delivery systems. The proposed GPR model provides a reliable and efficient tool for predicting solubility, supporting the design and optimization of scCO2-assisted drug formulation methods.

## Linked entities

- **Chemicals:** exemestane (PubChem CID 60198)
- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Chemicals:** EXE (MESH:C056516), CO2 (MESH:D002245), scCO2 (-), water (MESH:D014867)
- **Mutations:** T in K

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12796340/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12796340/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12796340/full.md

---
Source: https://tomesphere.com/paper/PMC12796340