# Accelerating supercritical pharmaceutical formulation via interpretable data-driven prediction of drug solubility

**Authors:** El-Sayed Khafagy, Amr Selim Abu Lila, Mahboubeh Pishnamazi

PMC · DOI: 10.1038/s41598-026-44161-9 · 2026-03-14

## TL;DR

This paper introduces a machine learning framework to predict drug solubility in supercritical CO2, speeding up pharmaceutical formulation development.

## Contribution

The novel contribution is an interpretable data-driven framework for drug solubility prediction in supercritical CO2 with mechanistic insights.

## Key findings

- Machine learning models like Extreme Gradient Boosting and Support Vector Regression improve solubility prediction accuracy.
- Sensitivity and amplitude-based analyses reveal key molecular and process factors affecting solubility.
- The framework provides actionable insights for drug selection and supercritical processing design.

## Abstract

Drug solubility in supercritical carbon dioxide (SC-CO2) plays a pivotal role in the development of particle engineering, drug loading, and solvent-free pharmaceutical formulations. However, experimental solubility determination in supercritical systems remains costly, time-consuming, and compound-specific. In this study, an interpretable data-driven framework is proposed to support pharmaceutical formulation scientists by accurately predicting drug solubility in SC-CO2 while elucidating the governing physicochemical factors. Multiple machine learning regressors, including Extreme Gradient Boosting and Support Vector Regression, were developed and further integrated into an ensemble strategy to enhance robustness and generalizability. Model performance was systematically optimized using bio-inspired metaheuristic algorithms, enabling efficient hyperparameter selection across complex, nonlinear search spaces. Beyond predictive accuracy, model interpretability was emphasized through sensitivity-based and amplitude-based feature analyses, revealing the dominant molecular descriptors and process conditions influencing solubility behavior. The results demonstrate that the proposed framework not only improves solubility prediction accuracy but also provides mechanistic insights relevant to drug selection, formulation feasibility, and supercritical processing design. This work establishes a practical computational tool for accelerating pharmaceutical development pipelines involving supercritical fluid technologies.

The online version contains supplementary material available at 10.1038/s41598-026-44161-9.

## Full-text entities

- **Diseases:** SVHL (MESH:D000079426), toxicity (MESH:D064420), asthmatic (MESH:D013224), ulcer (MESH:D014456)
- **Chemicals:** sc- (MESH:D012538), water (MESH:D014867), erlotinib (MESH:D000069347), Niflumic acid (MESH:D009544), CO2 (MESH:D002245), Rivaroxaban (MESH:D000069552), Tolfenamic acid (MESH:C009500), montelukast (MESH:C093875), clonazepam (MESH:D002998), salt (MESH:D012492), Glibenclamide (MESH:D005905), raloxifene (MESH:D020849), phenytoin (MESH:D010672), gemifloxacin (MESH:D000077735), favipiravir (MESH:C462182), ethanol (MESH:D000431), busulfan (MESH:D002066), Nystatin (MESH:D009761), famotidine (MESH:D015738), hydrogen (MESH:D006859), CAM (-)
- **Species:** Anser (geese, genus) [taxon 8842]

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13043671/full.md

---
Source: https://tomesphere.com/paper/PMC13043671