FibreCastML: an open web platform for predicting electrospun nanofibre diameter distributions for biomedical applications

Elisa Roldán; Kirstie Andrews; Stephen M. Richardson; Reyhaneh Fatahian; Glen Cooper; Rasool Erfani; Tasneem Sabir; Neil D. Reeves

PMC · DOI:10.3389/fbioe.2026.1713804·February 18, 2026

FibreCastML: an open web platform for predicting electrospun nanofibre diameter distributions for biomedical applications

Elisa Roldán, Kirstie Andrews, Stephen M. Richardson, Reyhaneh Fatahian, Glen Cooper, Rasool Erfani, Tasneem Sabir, Neil D. Reeves

PDF

Open Access

TL;DR

FibreCastML is a new open web platform that predicts the full range of nanofibre diameters in electrospinning, helping improve biomedical scaffold design.

Contribution

FibreCastML is the first open-access, distribution-aware machine learning framework for predicting full fibre diameter spectra in electrospinning.

Findings

01

Non-linear and local machine learning models achieved high accuracy (R² > 0.91) for predicting diameter distributions of multiple polymers.

02

Concentration was identified as the most influential parameter affecting fibre diameter distributions.

03

Predicted diameter distributions closely matched experimental results (Kolmogorov–Smirnov p > 0.13 and 84% overlap coefficient).

Abstract

Electrospinning is a scalable technique for generating fibrous scaffolds with tunable micro- and nanoscale architectures for tissue engineering, drug delivery, and wound care. Machine learning (ML) has emerged as a powerful tool to accelerate process optimisation; however, existing models typically predict only mean fibre diameters, overlooking the entire diameter distribution that governs scaffold functionality and biomimicry. This study introduces FibreCastML, the first open-access, distribution-aware ML framework that predicts full fibre diameter spectra from routinely reported processing parameters and provides interpretable insights into parameter influence. A comprehensive meta-dataset of 68,538 fibre-diameter measurements from 1,778 studies across 16 biomedical polymers was curated. Six standard input parameters (solution concentration, voltage, flow rate, tip-to-collector…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

SHROOM4

Proteins1

Species1

Homo sapiens(human · species)

Chemicals20

Polyurethane

gamma-PGA PVDF PET PU PVP PAN PLA PS PMMA Pd water Polymer PEEK PCL CA PVA Nylon 6 Au L-lactide

Diseases1

infection

Figures13

Click any figure to enlarge with its caption.

Outline of the followed methodology from dataset creation to open web platform.

“Predicted vs. Observed” scatter plots for PVA. (A) Linear model (lm), (B) Randon forest (ranger).

Feature importance in electrospun fibre diameter for PVDF and SVM model.

SHAP values distribution for each feature for PVDF and SVM model.

Correlation matrix of electrospinning variables for PVDF and SVM model.

3D response-surface plots with predicted fibre diameter (nm) as a function of solution concentration and (A) needle diameter, (B) rotation speed, (C) voltage, and (D) flow rate.

Screenshots of the web FibreCastML interface. (A) Status tab, (B) Prediction and metrics tab.

Predicted fibre diameter obtained from the web FibreCastML interface.

SEM images and total fibre diameter distributions created with (A,C) TL-01, (B,D) Spraybase®.

Fibre diameter distributions from real scaffolds created with Spraybase® (Real) and predicted distribution (Simulated).

Violin comparison between real values obtained with Spraybase® (Real) and predicted values (Simulated).

(A) Density and (B) violin plot for real values obtained with LT-01 (Real) and predicted values (Simulated).

Tables1

TABLE 1. Evaluation of ML models for each polymer studied to predict fibre diameter distributions (mean ± standard deviation).

Polymer	Model	RMSE	MAE	R ²	Polymer	Model	RMSE	MAE	R ²
CA	lm	175.017 ± 4.230	135.138 ± 3.561	0.588 ± 0.025	PLA	lm	105.616 ± 4.475	90.255 ± 4.409	0.573 ± 0.051
	Glmnet	174.882 ± 8.602	135.196 ± 7.362	0.587 ±0.025		Glmnet	105.605 ± 7.327	90.247 ± 6.583	0.573 ± 0.045
	Rpart	53.992 ± 4.224	42.526 ± 3.442	0.960 ± 0.006		Rpart	93.487 ± 6.894	59.481 ± 6.742	0.656 ± 0.062
	Earth	138.544 ± 4.509	112.891 ± 3.421	0.742 ± 0.015		Earth	93.791 ± 7.388	61.310 ± 7.191	0.657 ± 0.065
	svmRadial	46.031 ± 3.001	31.351 ± 1.726	0.972 ± 0.003		svmRadial	97.569 ± 8.583	54.123 ± 6.819	0.640 ± 0.082
	knn	45.100 ± 1.675	32.138 ± 1.059	0.972 ± 0.002		knn	93.556 ± 5.912	59.344 ± 5.799	0.661 ± 0.054
	Ranger	45.036 ± 1.560	32.125 ± 1.182	0.973 ± 0.003		Ranger	93.546 ± 6.501	59.324 ± 7.555	0.662 ± 0.064
Gelatin	lm	210.855 ± 17.442	158.071 ± 8.131	0.324 ± 0.033	PMMA	lm	421.675 ± 10.313	347.970 ± 5.606	0.412 ± 0.033
	Glmnet	210.919 ± 11.164	158.619 ± 4.039	0.323 ± 0.018		Glmnet	421.624 ± 8.050	347.737 ± 5.128	0.411 ± 0.026
	Rpart	174.602 ± 16.526	97.377 ± 4.569	0.538 ± 0.041		Rpart	179.724 ± 13.974	114.973 ± 7.638	0.893 ± 0.019
	Earth	185.908 ± 14.775	115.994 ± 5.950	0.475 ± 0.034		Earth	167.878 ± 3.199	107.896 ± 1.959	0.907 ± 0.006
	svmRadial	182.572 ± 29.801	81.143 ± 9.241	0.509 ± 0.086		svmRadial	55.279 ± 2.739	41.527 ± 1.845	0.990 ± 0.001
	knn	170.621 ± 16.482	83.267 ± 6.172	0.558 ± 0.043		knn	51.712 ± 1.807	33.234 ± 0.817	0.991 ± 0.001
	Ranger	169.780 ± 18.120	83.204 ± 5.575	0.562 ± 0.049		Ranger	51.508 ± 2.970	33.198 ± 1.674	0.991 ± 0.001
Nylon-6	lm	41.416 ± 1.499	32.094 ± 1.197	0.975 ± 0.001	PS	lm	402.950 ± 10.395	280.919 ± 8.958	0.817 ± 0.012
	Glmnet	41.352 ± 1.874	32.245 ± 1.457	0.975 ± 0.002		Glmnet	403.784 ± 20.076	282.010 ± 15.397	0.818 ± 0.012
	Rpart	36.642 ± 1.626	27.654 ± 1.383	0.980 ± 0.002		Rpart	315.280 ± 14.132	224.176 ± 12.015	0.887 ± 0.021
	Earth	39.579 ± 1.807	30.621 ± 1.432	0.977 ± 0.002		Earth	301.847 ± 10.771	189.908 ± 5.453	0.898 ± 0.009
	svmRadial	34.780 ± 1.449	24.989 ± 0.672	0.982 ± 0.002		svmRadial	307.081 ± 18.610	172.441 ± 10.530	0.898 ± 0.019
	knn	32.825 ± 1.453	23.652 ± 0.894	0.984 ± 0.002		knn	288.616 ± 8.266	170.740 ± 5.874	0.906 ± 0.010
	Ranger	32.828 ± 1.088	23.608 ± 0.973	0.984 ± 0.001		Ranger	287.562 ± 11.511	169.987 ± 8.905	0.906 ± 0.012
PAN	lm	42.335 ± 0.774	33.848 ± 0.651	0.495 ± 0.025	PU	lm	29.170 ± 1.611	23.390 ± 1.218	0.920 ± 0.008
	Glmnet	42.357 ± 0.883	33.884 ± 0.546	0.495 ± 0.019		Glmnet	29.209 ± 0.860	23.393 ± 0.879	0.919 ± 0.005
	Rpart	27.805 ± 3.716	20.475 ± 2.044	0.778 ± 0.064		Rpart	22.217 ± 1.640	17.610 ± 1.554	0.953 ± 0.007
	Earth	21.031 ± 0.430	14.947 ± 0.314	0.875 ± 0.008		Earth	25.484 ± 1.636	20.254 ± 1.649	0.938 ± 0.008
	svmRadial	15.156 ± 1.056	9.592 ± 0.377	0.935 ± 0.009		svmRadial	19.919 ± 1.064	15.398 ± 0.667	0.962 ± 0.004
	knn	10.135 ± 0.362	7.420 ± 0.273	0.971 ± 0.002		knn	20.077 ± 1.751	15.440 ± 1.233	0.962 ± 0.007
	Ranger	10.134 ± 0.269	7.414 ± 0.169	0.971 ± 0.002		Ranger	19.948 ± 1.068	15.370 ± 0.788	0.962 ± 0.005
PCL	lm	92.204 ± 3.472	68.002 ± 1.682	0.642 ± 0.029	PVA	lm	145.549 ± 3.264	111.057 ± 3.462	0.457 ± 0.014
	Glmnet	92.172 ± 2.730	67.945 ± 1.420	0.643 ± 0.023		Glmnet	145.486 ± 4.232	110.924 ± 2.948	0.457 ± 0.023
	Rpart	85.521 ± 4.407	62.085 ± 2.278	0.691 ± 0.037		Rpart	102.509 ± 7.247	70.095 ± 4.239	0.728 ± 0.042
	Earth	84.845 ± 2.870	60.618 ± 2.252	0.697 ± 0.038		Earth	96.495 ± 3.475	68.832 ± 2.607	0.761 ± 0.013
	svmRadial	96.327 ± 9.476	55.808 ± 4.569	0.621 ± 0.072		svmRadial	69.530 ± 6.672	33.611 ± 2.281	0.879 ± 0.020
	knn	83.941 ± 4.124	58.686 ± 3.328	0.703 ± 0.046		knn	48.814 ± 3.802	26.061 ± 1.767	0.939 ± 0.011
	Ranger	83.859 ± 4.332	58.655 ± 3.162	0.702 ± 0.032		Ranger	48.645 ± 3.518	25.990 ± 1.634	0.939 ± 0.008
PDLLA	lm	500.809 ± 29.659	424.237 ± 22.662	0.599 ± 0.037	PVDF	lm	178.836 ± 3.638	127.426 ± 2.748	0.577 ± 0.018
	Glmnet	501.425 ± 19.616	424.735 ± 19.056	0.597 ± 0.030		Glmnet	178.731 ± 3.880	127.601 ± 1.800	0.578 ± 0.026
	Rpart	202.659 ± 26.015	147.514 ± 23.967	0.932 ± 0.021		Rpart	135.819 ± 8.902	95.328 ± 6.621	0.756 ± 0.037
	Earth	467.554 ± 10.667	382.354 ± 11.061	0.649 ± 0.027		Earth	85.685 ± 3.224	55.733 ± 3.164	0.903 ± 0.007
	svmRadial	57.879 ± 1.058	55.374 ± 1.217	0.996 ± 0.001		svmRadial	74.342 ± 2.267	41.827 ± 1.383	0.927 ± 0.004
	knn	12.559 ± 0.413	9.365 ± 0.394	0.9997 ± 0.000		knn	73.161 ± 1.753	40.885 ± 0.952	0.929 ± 0.004
	Ranger	12.466 ± 0.913	9.342 ± 0.651	0.9998 ± 0.000		Ranger	73.161 ± 2.683	40.907 ± 1.596	0.929 ± 0.005
PEEK-sulfonated	lm	20.431 ± 0.564	16.474 ± 0.547	0.130 ± 0.037	PVP	lm	1,670.019 ± 45.876	1,093.773 ± 25.227	0.349 ± 0.014
	Glmnet	20.432 ± 0.868	16.474 ± 0.544	0.130 ± 0.042		Glmnet	1,670.128 ± 58.132	1,090.075 ± 24.867	0.349 ± 0.008
	Rpart	17.030 ± 1.029	12.966 ± 0.970	0.396 ± 0.068		Rpart	1,112.460 ± 59.527	572.430 ± 32.495	0.712 ± 0.043
	Earth	15.838 ± 0.631	12.363 ± 0.442	0.481 ± 0.018		Earth	1,522.597 ± 78.302	964.072 ± 28.101	0.459 ± 0.012
	svmRadial	14.399 ± 1.169	10.497 ± 0.698	0.573 ± 0.051		svmRadial	1,238.841 ± 105.595	562.267 ± 19.778	0.674 ± 0.038
	knn	13.975 ± 0.780	10.459 ± 0.533	0.594 ± 0.037		knn	1,029.798 ± 63.047	492.726 ± 26.714	0.755 ± 0.021
	Ranger	14.044 ± 0.762	10.517 ± 0.503	0.589 ± 0.037		Ranger	1,024.187 ± 68.943	491.455 ± 30.790	0.757 ± 0.031
PET	lm	63.312 ± 5.541	46.441 ± 4.977	0.869 ± 0.018	γ-PGA	lm	88.995 ± 7.094	53.577 ± 3.928	0.171 ± 0.033
	Glmnet	63.311 ± 4.166	46.441 ± 3.825	0.867 ± 0.011		Glmnet	89.040 ± 9.351	53.641 ± 4.561	0.169 ± 0.059
	Rpart	63.324 ± 2.605	46.449 ± 2.230	0.868 ± 0.008		Rpart	86.653 ± 4.156	51.524 ± 3.042	0.214 ± 0.024
	Earth	63.379 ± 4.766	46.506 ± 3.928	0.869 ± 0.013		Earth	85.276 ± 7.114	48.853 ± 4.064	0.236 ± 0.035
	svmRadial	64.561 ± 2.598	46.641 ± 1.864	0.868 ± 0.008		svmRadial	90.236 ± 7.340	43.736 ± 4.610	0.192 ± 0.046
	knn	63.274 ± 3.234	46.483 ± 2.472	0.869 ± 0.007		knn	85.060 ± 8.853	47.764 ± 5.122	0.242 ± 0.045
	Ranger	63.265 ± 2.033	46.472 ± 1.450	0.869 ± 0.007		Ranger	85.042 ± 9.607	47.702 ± 4.558	0.241 ± 0.048

Keywords

artificial intelligenceelectrospinningmachine learning modelsmeta-analysisnanofibresopen-access web applicationsustainability

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElectrospun Nanofibers in Biomedical Applications · Advanced Sensor and Energy Harvesting Materials · Machine Learning in Materials Science

Full text

Introduction

1

Electrospinning is a versatile electrohydrodynamic process that enables the fabrication of micro- and nano-polymer fibres with architectures that emulate key features of the extracellular matrix, underpinning applications in biomedical engineering such as regenerative medicine and tissue engineering (Baek et al., 2015); wound care and infection control (Liu et al., 2019; Fatahian et al., 2020); drug, protein, gene and vaccine delivery (Cao et al., 2010; Shahriar et al., 2019); medical device integration (Hadi Salih et al., 2025); diagnostics, biosensing and wearables (Yew et al., 2018; Li J-W. et al., 2023); viral filtration (Naragund and Panda, 2022); or research tools such as organ-on-chip (Yang et al., 2018). The process initiates when the applied electric field overcomes surface tension at the pendant droplet, forming the characteristic Taylor cone; once a critical field is reached, a charged jet is ejected, stretches, and undergoes bending/whipping instabilities before solidifying on the collector. The onset and stability of this cone, jet regime, and thus the fibre population that ultimately forms, depend on a coupled set of solution (including concentration, viscosity, molecular weight, surface tension, or conductivity), process (voltage, flow rate, needle diameter, type of collector, revolutions of the mandrel, or tip-to-collector distance), and ambient parameters (such as relative humidity or temperature) (Haider et al., 2015). Together, these factors control morphology (fibre-diameter distribution, inter-fibre separation), topography (alignment, surface roughness), and emergent mechanical properties (Young’s modulus, ultimate tensile strength or strain at break) of the electrospun scaffolds (Roldán et al., 2024c). However, establishing the ideal combination of those parameters, to obtain the desired properties, is one of the biggest challenges in electrospinning (Tarus et al., 2016).

One of the properties most studied in electrospinning is the diameter of the fibre, and how the process parameters affect to it. While average fibre diameter is an important design lever, affecting surface area, porosity, mechanics, and cell–material interactions, there is mounting evidence that diameter distributions govern mesoscale behaviour, including local stiffness heterogeneity, transport percolation, and fibre–fibre contact statistics (Bagherzadeh et al., 2014; Picu, 2021; Sauter et al., 2021). This distinction is decisive for biomimicry, where healthy ligaments exhibit a robust bimodal collagen-diameter spectrum with two well-defined peaks, whereas injured ligaments collapse toward a unimodal distribution, with measurable consequences for function and mechanobiology (Beisbayeva et al., 2021; Smatov et al., 2023). Designing electrospun scaffolds to reproduce such spectra therefore requires predictive tools that map processing choices to the full conditional distribution of fibre diameters, not merely to a single expected mean value.

Conventional optimisation, one-factor-at-a-time experiments and classical designs of experiments (e.g., response-surface, Box–Behnken, central composite) improves efficiency but remains costly, time-consuming, narrow in scope, and largely mean diameter-centred (Roldán et al., 2023b). Recent machine-learning (ML) approaches have achieved accurate point predictions of diameter from solution chemistry, processing, and equipment variables (Ieracitano et al., 2021; Pervez et al., 2023; Sukpancharoen et al., 2024). However, most published models return a single number, which is insufficient when the design target is a specified shape of the diameter spectrum (e.g., unimodal vs. bimodal) that must be propagated into microstructure-aware simulations of tissue behaviour.

This article advances electrospinning from mean-centred optimisation to distribution-aware design, presenting the first framework to predict the entire distribution of electrospun fibre diameters conditioned on polymer–solvent composition and process parameters, explicitly capturing unimodal and bimodal behaviours essential for mimicking healthy and injured tissue microstructures.

Central to this contribution is a novel, comprehensive database curated from 1,778 independent studies, spanning 16 polymers and comprising 68,538 fibre-diameter observations. The database covers the following polymers: Cellulose acetate (CA), Gelatin, Polyamide 6 (Nylon 6), Polyacrylonitrile (PAN), Polycaprolactone (PCL), polyD,L-lactide (PDLLA), Polyether ether ketone (PEEK), Polyethylene terephthalate (PET), Polylactic acid (PLA), Polymethyl methacrylate (PMMA), Polystyrene (PS), Polyurethane (PU), Polyvinyl alcohol (PVA), Polyvinylidene fluoride (PVDF), Polyvinylpyrrolidone (PVP) and poly γ-glutamic acid (γ-PGA). In electrospun form, these materials have been deployed for biomedical purposes including antibacterial wound dressings (Ignatova et al., 2007; Keirouz et al., 2020; Samadian et al., 2020; Sun et al., 2020; Teixeira et al., 2020; Huang et al., 2022; Liang et al., 2023; Foroozandeh et al., 2024), tissue engineering (Baker et al., 2006; Zhang et al., 2016; Rahmati Nejad et al., 2020; Ahmadi et al., 2022; Roldán et al., 2023a; Roldán et al., 2024a; Zaszczyńska et al., 2024) and regenerative medicine (Bi et al., 2020; Ekambaram and Dharmalingam, 2020; Gryshkov et al., 2021). Collectively, these exemplars illustrate that each material has a credible electrospun route into biomedical applications.

A second key contribution is translational: FibreCastML is the first open-access web application that allows users to specify electrospinning parameters and obtain polymer-specific predictions of the full fibre-diameter distribution rather than a single average. The application (i) flags inputs outside the observed domain of the database, (ii) visualises predicted vs. observed behaviour and metrics performance of seven supervised ML algorithms for the 16 polymers, (iii) complements model benchmarking with interpretable analyses (variable importance and SHAP) and parameter correlations, (iv) offers solvent-system suggestions by proximity in parameter space, and (v) provide all the results in a Excel worksheet.

The impact of this work is a step-change in how electrospinning is done: instead of iterating blindly on mean diameter, researchers can plan and justify experiments up front against the full fibre-diameter spectrum that governs function. Concretely, the framework (i) reduces trial-and-error, solvent consumption, and material waste by letting teams screen conditions digitally; (ii) improves reproducibility and auditability through standardised diagnostics, out-of-range warnings, and an exportable Excel report; (iii) raises efficiency by pointing to the most controllable levers (e.g., concentration) and by proposing historically successful solvent systems obtained from a large-scale database of 68,538 observations; and (iv) democratises access with an open app so resource-constrained groups can reach biomimetic targets without costly optimisation campaigns. In addition, to illustrate practical usability, a validation case was performed. The result is a safer, more sustainable, and faster translation of electrospun scaffolds for tissue engineering, wound care, and drug delivery, backed by a large, curated evidence base and rigorous validation.

Methods

2

Data collection

2.1

A comprehensive dataset was created by reviewing research literature in the Scopus and Google Scholar databases. The search was conducted using the keywords “electrospinning” & popular polymers used in electrospinning for biomedical applications such as Cellulose acetate (CA), Gelatin, Polyamide 6 (Nylon 6), Polyacrylonitrile (PAN), Polycaprolactone (PCL), polyD,L-lactide (PDLLA), Polyether ether ketone (PEEK), Polyethylene terephthalate (PET), Polylactic acid (PLA), Polymethyl methacrylate (PMMA), Polystyrene (PS), Polyurethane (PU), Polyvinyl alcohol (PVA), Polyvinylidene fluoride (PVDF), Polyvinylpyrrolidone (PVP) and poly γ-glutamic acid (γ-PGA). Only full reproducible articles with single polymers and reporting fibre distributions in their experimental designs were included in the dataset. In addition to this search, the dataset was completed with two published datasets (Sarma, 2023; Mahdian et al., 2025) which, in total, yielded 1,778 studies.

The dataset consist of the following input fields: DOI, polymer, solvent1, solvent2, solvent3, solvent1_ratio (%), solvent2_ratio (%), solvent3_ratio (%), concentration (%), needle diameter, type of collector, rotation speed (rpm), voltage (kV), flow rate (mL/h), tip-to-collector distance (cm), temperature (°C), humidity (%). The output variable is the distribution of the diameter of the fibres obtained from the 1,778 studies, comprising a total of 68,538 fibre-diameter observations.

Data preprocessing

2.2

Data were imported with all fields initially read as text to prevent implicit type coercion. Numeric fields were parsed using a custom routine that removed non-numeric characters, reconciled European/US decimal conventions (comma vs. dot), and converted to numeric; non-parsable entries became missing. The entire dataset was checked for missing values, outliers, and inconsistencies. Observations with non-finite fibre diameter were discarded and rows with missing values were dropped (no imputation).

Preprocessing and normalisation were implemented using the caret library’s recipes pipeline. For each polymer-specific dataset, the recipe included (i) removal of zero-variance predictors (step_zv on all predictors) and (ii) z-normalisation (step_normalize) of all numeric predictors to zero mean and unit variance, while leaving the outcome (fibre diameter, in nm) on its original scale. To prevent information leakage, these steps were estimated independently within each resampling split: the centring and scaling parameters were computed using only the training folds in the repeated 5-fold cross-validation and then applied to the corresponding validation folds and to any new prediction inputs, following previous studies (Roldán et al., 2023b).

All preprocessing, ML modeling, and reported library functions were implemented in R 4.3.0 using RStudio 2023.03.1.

Prediction models

2.3

The supervised learning task is defined on a polymer-specific subset of the data in which six process variables (solution concentration, needle diameter (g), rotation speed, voltage (kV), flow rate (mL h^-1^), and tip-to-collector distance (cm)) serve as inputs, and the fibre diameters are the output. Models are trained with standardised predictors (zero-variance removal and z-score normalisation) using the recipes and caret ecosystem. Seven complementary learners were evaluated to span simple, interpretable models through flexible nonlinear methods. Ordinary least squares (lm) were used as a transparent linear baseline, relating each predictor to the outcome with interpretable coefficients. Elastic net (glmnet) extended this baseline by adding L1/L2 regularisation to stabilise estimates under multicollinearity and shrink or discard weak signals (Hastie and Tay, 2025). Decision trees (rpart) partitioned the data into if–then rules, providing human-readable logic but with a tendency to overfit (Quinlan, 1986). Random forests (ranger) mitigated that risk by averaging many trees grown on resampled data and random subsets of variables, improving accuracy and robustness (Breiman, 2001). A radial-basis support vector machine (kernlab’s svmRadial) modelled smooth nonlinear relationships by mapping inputs to a higher-dimensional space and maximising the margin (Cortes and Vapnik, 1995). K-nearest neighbours (knn) produced predictions based on the most similar cases in the training set, offering a simple, assumption-light local method (Zhang, 2016). Finally, multivariate adaptive regression splines (earth/MARS) captured curved effects by stitching together piecewise-linear segments with automatically placed knots, maintaining a degree of interpretability (Greenwell, 2020).

Hyperparameters were tuned on compact, deterministic grids appropriate to heterogeneous experimental datasets: elastic net over α∈{0,0.25,0.5,0.75,1} with λ on a logarithmic grid; random forest with mtry spanning 1 … p, min. node.size∈{1,5,10} and 500 trees; SVM-R with sigma initialised by sigest and C∈{0.25,0.5,1,2,4}; decision tree with cp on 10^{−4}^…10^{−1}^; kNN with k∈{3,5,7,9,11}; MARS with degree∈{1,2} and nprune∈{5,10,15,20,25}. All training used fixed random seeds to ensure exact reproducibility, and plotting and reporting were handled with ggplot2, dplyr/tidyr, readxl, and openxlsx.

Model selection and performance estimation followed a nested cross-validation design to avoid optimistic bias following previous studies (Roldán et al., 2024b; Roldán Ciudad et al., 2025). An inner loop of five-fold cross-validation with two repeats was used to tune each model’s hyperparameters (via carettrainControl), a choice that balances bias and variance of the tuning criterion while remaining computationally tractable in an interactive setting; repeating the folds stabilises the selected configuration in small-n regimes typical of polymer-specific datasets. The outer loop adopted a leave-one-study-out K-fold scheme, where K is the number of independent studies available for the polymer (Figure 1). This grouping (implemented via study-level folds) yields an honest assessment of generalisation to unseen studies, minimises leakage arising from repeated or near duplicate conditions within a study, and mirrors the app’s real-world use in which users often extrapolate to new experimental batches. It is also worth noting that, as all modelling was conducted on polymer-specific subsets, each polymer is evaluated only within its own distribution, ensuring that no imbalance data can affect training, hyperparameter selection, or interpretability. This also eliminates the need for categorical encoding: the polymer variable is not included as a predictor and each model learns only from homogeneous material behaviour.

Validation of the models.

Aggregating these out-of-fold predictions provides the empirical distribution of modelled fibre diameters used in the interface’s histograms and predicted-versus-observed plots, delivering both a calibrated point prediction and an interpretable sense of spread for decision-making.

Performance metrics were computed on the out-of-fold predictions. Coefficient of determination (R ^2^), square root of the mean squared error (RMSE) and mean absolute error (MAE) were calculated following Equations 1–3:

[eqn]

[eqn]

[eqn]

Where $[eqn]$ is the actual value, $[eqn]$ is the predicted value, $[eqn]$ is the mean of the actual values, and n is the number of samples.

To quantify model stability, all performance metrics (R ^2^, RMSE, MAE) are reported as mean ± standard deviation across outer cross-validation folds, which is widely used in machine-learning evaluation to characterise fold-to-fold variability.

Importance of the features on the diameter of the fibres

2.4

To clarify which process parameters most strongly shape the predicted fibre-diameter distribution, two complementary quantities are reported: model-specific variable importance and model-agnostic SHAP (SHapley Additive exPlanations) values. Variable importance provides a global ranking of predictors, across all observations, according to how much each contributes to the model’s predictive capability under its own learning mechanism. In practical terms, higher-importance variables are those to which the fitted model is most sensitive when explaining variability in fibre diameter; hence, they are prime candidates for process control and experimental refinement. SHAP values, in contrast, are local attributions defined for each individual prediction. They quantify how much each feature pushes a prediction above or below a baseline expectation (the model’s average output on a reference background). Positive SHAP values indicate that a feature configuration increases the predicted diameter; negative values indicate a decrease. Aggregated across specimens, the spread and central tendency of SHAP values illuminate not only which variables matter but how they matter (directionality and effect size), offering an interpretable link between parameter settings and the modelled distribution of diameters.

In the FibreCastML application, variable importance is computed with the caret framework’s native varImp routine for the selected algorithm, yielding a single overall score per feature and visualised as a ranked bar chart (top 20). SHAP values are estimated with fastshap, which approximates Shapley attributions for the trained model by repeatedly perturbing features on a background sample (up to 200 rows drawn from the polymer-specific training set; 50 Monte Carlo simulations per feature) and passing them through a prediction wrapper identical to the one used for inference. SHAP results are summarised by ordering features by mean absolute SHAP magnitude and displaying a jittered dot plot for the six most influential variables, thereby showing both the central effect and its variability across cases. Together, the global ranking (importance) and the local attributions (SHAP) provide a robust, multi-scale explanation of the factors governing the distribution of fibre diameters in silico. It is noted that importance metrics and SHAP interpretations can be attenuated by multicollinearity and by model choice; accordingly, 3D response-surface plots and a correlation heat map is presented alongside these diagnostics to contextualise overlaps among predictors and to support cautious, experiment-driven conclusions.

FibreCastML open web platform

2.5

Shiny is an R framework for building interactive web applications directly from R scripts. It uses a reactive programming model that links a declarative user interface to a server that performs statistical computation. Inputs such as selectors, numeric fields, and buttons automatically update outputs (tables, figures, and text) whenever data or parameters change. This approach supports reproducibility and auditability of scientific analyses and enables seamless local or web deployment for collaborative use.

FibreCastML is a Shiny interface that predicts electrospun fibre diameters for a user-selected polymer and reports model performance, interpretability outputs, data diagnostics, and a reproducible summary. On initialisation, the application loads a comprehensive dataset of 68,538 fibre-diameter observations created by this research group and located alongside the app, and performs the preprocessing detailed in Section 2.2. The app then detects which modelling engines are available on the host and offers only those that can run there (e.g., linear model, elastic net, random forest, radial-basis SVM, decision tree, k-nearest neighbours, and MARS), with a sidebar note indicating what is available on that server.

The left-hand panel provides the control surface. The user selects a “Polymer” from those detected in the dataset and a “Collector type”. They then enter the operative electrospinning parameters as numbers: solution concentration, needle diameter (g), rotation speed, voltage (kV), flow rate (mL/h), and tip-to-collector distance (cm). A drop-down allows the choice of learning algorithm. A brief instruction reminds the user to complete all numeric fields before pressing “Run prediction”. During computation, the interface displays “WAIT … PROCESSING”; upon completion, it switches to “RESULTS IN PREDICTION & METRICS TAB”. A download control is always visible to export a comprehensive Excel report after a run has finished.

Outputs are organised into two tabs. The “STATUS” tab provides immediate, decision-support feedback prior to reviewing the full results. It first echoes the current application state. It then offers a polymer-specific solvent recommendation derived from historical records: within the subset matching the selected polymer, the app identifies candidate experiments that are close to the user’s settings after scaling each parameter by its observed variability. Candidates are scored using a convex combination of parameter proximity and closeness of fibre diameter to the current prediction, and the most frequent solvent triplet is summarised. Where ratio columns exist, median percentages for each solvent among the top candidates are also returned, proposing both composition and approximate proportions. Finally, the tab lists any user-entered parameters that fall outside the observed range for that polymer and displays the corresponding minima and maxima; if all values are within range, this is stated explicitly.

The “PREDICTION & METRICS” tab delivers the analytical core. When “Run prediction” is clicked, the server filters the dataset to the chosen polymer and constructs a modelling table with the six process parameters as predictors and the fibre diameter as the target. Once fitted, the app computes and displays a single prediction of fibre diameter for the user’s settings. It then shows a univariate distribution of out-of-sample predicted diameters arising from cross-validated fits and overlays a dotted vertical line at the user’s predicted value with a textual annotation, situating the configuration within the model’s typical prediction range. A concise performance table reports RMSE, mean absolute error (MAE), and the cross-validated coefficient of determination (R ^2^) from the cross-validation. Where the learner provides interpretable parameters, most directly in ordinary least squares and elastic net, the app prints a coefficient table with estimated effects; when a model lacks transparent coefficients, the interface states this clearly.

Diagnostics and interpretability are extended through additional figures. A “Predicted vs. Observed” scatter plots held-out predictions against actual fibre diameters with a unity line to reveal systematic bias or dispersion. Variable importance is computed via the modelling framework’s native mechanism and rendered as a horizontal bar chart for up to the top twenty predictors, enabling rapid identification of the most influential drivers. The app also estimates SHAP values using a background sample of the feature space and presents a dot-plot summary ordered by mean absolute SHAP magnitude for the six most influential variables, providing model-agnostic explanations that connect local and global behaviour. Finally, a correlation heat map spanning all predictors and the target is displayed with numeric overlays, aiding the detection of collinearity and linear associations in the polymer-specific training subset.

All outputs are reproducibly exportable via a structured Excel workbook generated on demand. The report contains a “Summary” sheet capturing the user’s inputs, the chosen model, and the predicted diameter, as well as the solvent recommendation; an “Out_of_Range” sheet enumerating any parameter excursions relative to the empirical domain for the polymer (note that although the tool flags certain user-defined conditions as ‘out of range,’ this message simply indicates that the parameters fall outside the values commonly reported in the literature for stable Taylor cone formation, and the model still performs predictions and generates all interpretability outputs); data-rich sheets for cross-validated predictions, prediction distribution, metrics, coefficients (when available), variable importance, SHAP summaries (when available), and the correlation matrix; and embedded, publication-ready versions of the key figures (prediction distribution, predicted-versus-observed scatter, variable importance chart, SHAP summary, and correlation heat map).

The followed procedure for the whole study can be found in Figure 2.

Outline of the followed methodology from dataset creation to open web platform.

Experimental validation case

2.6

An external validation dataset was generated to assess the robustness and generalisation capability of FibreCastML. All validation data were entirely independent from the training database; no images, or samples used for model development were included in this validation study. To ensure the robustness of the validation case, two different electrospinning devices, and two separate scanning electron microscopes (SEMs) were employed.

Electrospun scaffolds were produced on different days using two different electrospinning systems:

A TL-01, NaBond (China),
A Spraybase® (Ireland), and

PVA with average molecular weight 30,000-70,000 (commonly used in electrospinning) were included in this validate case (#P8136, Sigma Aldrich, United Kingdom). All PVA-based solutions were produced by dissolving the polymer in distilled water (dH_2_O) at 12% w/w by heating to 80 °C with stirring until a homogeneous solution formed. A 10 mL syringe was loaded with 5 mL of polymer solution, and it pumped with at 1 mL/h flow rate through an 18 G needle. A voltage of 20 kV was applied between the needle tip and the collector to generate the electrostatic field. Fibres were deposited onto aluminium foil affixed to a rotating drum collector operating at 2000 rpm. The tip-to–collector distance was set to 10 cm. Electrospinning was conducted at room temperature (25 °C).

Samples were sputter-coated with Au/Pd using a SC7640 sputter coater (Quorum Technologies Ltd., Kent, United Kingdom) prior to visualisation. Coating was performed at 20 mA and 0.8 kV fo, yielding a nominal thickness of 32.6 nm according to the instrument specification. Two different SEMs were used for image acquisition: a Benchtop SEM (Hitachi TM4000Plus, Hitachi High-Tech Europe GmbH) and field emission scanning electron microscope (Zeiss Supra 40, FE-SEM, Carl Zeiss SMT Ltd., Cambridge, United Kingdom) to ensure that the model remained robust across different SEM devices. SEM micrographs were acquired at an accelerating voltage of 2 kV, a working distance of approximately 6 mm, and magnifications up to ×30,000.

Fibre diameter (Ø) was quantified in AxioVision SE64 Rel. 4.9.1 (Carl Zeiss SMT Ltd., Cambridge, United Kingdom) following a previous study (Roldán et al., 2025). For each sample, twenty fibres were measured from representative high-magnification fields. To guaranty reproducibility across electrospinner systems, three replicates were performed on different days and two samples per replicate were evaluated.

Statistical analyses were performed to validate the usability of FibreCastML. All statistical analyses were performed to compare the FibreCastML-predicted fibre diameters with the experimentally measured diameters obtained from the TL-01 and Spraybase® electrospinning systems. All analyses were conducted using a significance threshold of p < 0.05.

Prior to inferential testing, data distributions for each group were assessed for normality using the Shapiro–Wilk test, which indicated significant deviations from normality; therefore, non-parametric methods were selected. Descriptive statistics (median, standard deviation, interquartile range, minimum and maximum values) were calculated for each dataset to characterise central tendency and dispersion.

In addition to providing a single point prediction for the fibre diameter, the model explicitly quantifies predictive uncertainty through a residual-bootstrap Monte Carlo procedure tailored to the statistical behaviour of the experimental dataset. In case of PVA, the dataset contains 6,610 PVA records spanning a wide range of electrospinning configurations; thus, the dispersion observed in the data reflects both the intrinsic variability of PVA and the system’s sensitivity to diverse processing conditions. Consequently, each prediction should be interpreted not as a single deterministic value, but as a draw from the conditional distribution of fibre diameters for the hypothetical population defined by the user-selected inputs. To approximate this conditional predictive distribution, the model first computes the point prediction ŷ for a new set of conditions and then generates 100 Monte Carlo realisations of the predicted diameter using a residual bootstrap of the cross-validated errors following Equation 4,

[eqn]

where $[eqn]$ are sampled with replacement from the empirical residual distribution. This non-parametric procedure yields an empirical predictive distribution conditioned on the selected inputs, informed by the variability observed across the 6,610 experimental PVA configurations. The resulting ensemble is then used for statistical comparison with real scaffold measurements (Kolmogorov Smirnov, U Mann–Whitney, t-test included solely as a complementary assessment of mean differences, overlap coefficient, Kullback–Leibler divergence, Wasserstein distance) and for visual distributional diagnostics. Integrating residual-bootstrap uncertainty quantification directly into an interactive, process-aware electrospinning prediction tool is, to our knowledge, novel in this field and moves beyond standard practice based solely on point estimates or global error metrics, enabling a more transparent, data-driven and risk-aware interpretation of the model’s predictions.

Results and discussion

3

Machine learning model selection

3.1

Across nearly all polymers, non-linear and local learners substantially outperform linear models. Kernel SVMs, random forests, and k-nearest neighbours consistently achieve the highest coefficients of determination, while ordinary least squares and penalised linear models perform worst, aligned to previous studies (Rahbar and Vadood, 2015; Kalantary et al., 2020; Roldán et al., 2024b). For example, for CA the best models improve R ^2^ from ≈0.59 with linear regression to ≈0.97 with kernel SVM/k-NN/random forest, and for PMMA the improvement is even more pronounced (≈0.41 to ≈0.99). Nylon-6 is already well modelled by simple relations (R ^2^ ≈ 0.97) but still benefits modestly from flexible learners (≈0.984). These patterns are consistent with the strongly non-linear dependence of electrospinning outcomes on solution properties, process parameters, and environment, as well as frequent interactions among these factors as previously reported (Roldán et al., 2023b; Roldán et al., 2025). Ensembles outperform single trees because they reduce variance and capture higher-order interactions (Muqeet et al., 2023), and local methods excel when the dataset contains clusters of near-replicated conditions typical of literature compilations (Li K. et al., 2023).

Figure 3 shows the “Predicted vs. Observed” scatter plots for PVA obtained with linear regression and random forest for the following conditions: concentration of the solution 12%, needle diameter 20 G, rotational speed 2000 rpm, voltage 25 kV, flow rate 1 mL/h and tip-to-collector distance 11 cm “Predicted vs. Observed” scatter plots for the rest of the models and polymers for those conditions (112 cases) can be found in the links provided in Supplementary Material.

“Predicted vs. Observed” scatter plots for PVA. (A) Linear model (lm), (B) Randon forest (ranger).

The difficulty of the prediction task is polymer-dependent. Polymers such as CA, Nylon-6, PAN, PDLLA, PMMA, PS, PU, PVA and PVDF, PU show high predictability (best R ^2^ > 0.91), suggesting that the routinely reported features capture much of the governing physics for these systems. In contrast, PCL, PET, PLA and PVP intermediate ceilings (best R ^2^ ≈ 0.66–0.87), indicative of greater heterogeneity or missing drivers. A third group, gelatin, sulfonated PEEK and γ-PGA, remains challenging (best R ^2^ < 0.59). These are mostly aqueous, biopolymeric, or polyelectrolytic systems whose electrospinning is highly sensitive to variables that are inconsistently reported or absent from the feature set, including pH, ionic strength and counter-ion identity, protein conformation or gel strength (e.g., Bloom value), degree of hydrolysis, and subtle thermal or aging histories. When key physicochemical descriptors are unobserved, even flexible learners can only recover a fraction of the variance (Ramprasad et al., 2017; Jablonka et al., 2020).

Two additional data characteristics explain performance variation. First, the operating range differs by polymer. Some materials are explored within tight, internally consistent windows (e.g., PET), producing nearly linear responses where all models converge to similar accuracy; others span multiple regimes (beaded vs. uniform fibres; different solvent families), which reduces the effectiveness of a single global model (Veleirinho et al., 2008). Second, literature-derived datasets are prone to outliers and measurement heterogeneity (Domingos, 2012). Non-parametric methods (forests, k-NN) reduce sensitivity to these effects, which helps explain their superior performance on polymers with broad or multimodal distributions (e.g., PS and PVA) (Breiman, 2001).

Metric choice also influences interpretation. Because the scale of the response differs markedly by polymer, RMSE and MAE are not comparable across materials; R ^2^ provides the appropriate cross-polymer lens. Within a polymer, however, RMSE/MAE remain valuable for ranking models and for practical error budgeting. Table 1 shows the evaluation of ML models for each polymer studied to predict fibre diameter for a hypothetical condition of: concentration of the solution 12%, needle diameter 20 G, rotational speed 2000 rpm, voltage 25 kV, flow rate 1 mL/h and tip-to-collector distance 11 cm.

Importance of the features on the distribution of the diameter of the fibres

3.2

Each model for each polymer provides different importance of the features. As an illustrative example, this section presents the importance plot, SHAP value distribution, and correlation matrix computed for the Support Vector Machine (SVM) model for PVDF under the following conditions: concentration of the solution 12%, needle diameter 20 G, rotational speed 2000 rpm, voltage 25 kV, flow rate 1 mL/h and tip-to-collector distance 11 cm. The importance of each feature on the diameter of the fibres was studied for the 7 ML models and the 16 polymers for the same electrospinning settings (112 cases), and they can be found in Supplementary Material.

Figure 4 shows the variable-importance plot, computed for the support vector machine (SVM) model for PVDF. This plot shows a clear hierarchy at the global level. Solution concentration is by far the most influential predictor of fibre diameter. Needle diameter comes next, followed by rotation speed, with smaller contributions from flow rate and voltage. The tip-to-collector distance and the type of collector contributes very little in this ranking. For an SVM, this importance reflects how much model accuracy drops when each predictor is perturbed. It therefore captures overall influence on predictions but not directionality. The ordering is physically sensible: concentration governs viscosity and chain entanglement, needle diameter affects the emerging jet at the nozzle, and rotation speed can draw fibres after formation, whereas flow rate, voltage, and distance usually have more context-dependent effects.

Feature importance in electrospun fibre diameter for PVDF and SVM model.

The SHAP summary complements this result by describing how each variable moves individual predictions up or down. Figure 5 shows that solution concentration shows the broadest spread of SHAP values, confirming that changes in concentration consistently drive large shifts in the predicted fibre diameter across the dataset. Flow rate and tip-to-collector distance display wider SHAP dispersions than their global ranks might suggest, indicating that they matter in specific regions of the operating space even if their average effect is smaller. Voltage has a moderate, fairly uniform impact. Rotation speed and needle diameter are clustered near zero for most observations, with occasional large contributions. This pattern indicates that these two variables tend to influence the SVM decision function only under certain combinations of the other settings. The bidirectional spreads around zero further point to non-linear responses and interactions, which the SVM (especially with a non-linear kernel) captures naturally.

SHAP values distribution for each feature for PVDF and SVM model.

The correlation matrix summarizes linear associations (Figure 6). With fibre diameter, the strongest Pearson correlations are for solution concentration (≈0.48) and needle diameter (≈0.39), followed by rotation speed (≈0.25), voltage (≈0.16), and flow rate (≈0.13). Tip-to-collector distance shows a weak negative correlation (≈–0.17). The matrix also reveals notable inter-predictor correlation, such as rotation speed with voltage (≈0.74), flow rate with rotation speed (≈0.69), and tip-to-collector distance with voltage (≈0.66). Such collinearity means several predictors carry overlapping information, which can dilute a variable’s apparent importance or obscure its unique effect when viewed through simple pairwise correlations.

Correlation matrix of electrospinning variables for PVDF and SVM model.

Taken together, the three analyses converge on a consistent conclusion: solution concentration is the primary and most reliable parameter governing fibre diameter, in agreement with previous studies (Roldán et al., 2023b; Roldán et al., 2025). The dominant influence of concentration across models aligns with classical electrohydrodynamic theory, wherein concentration dictates chain entanglement and viscoelastic stress, key factors that suppress bead formation and enhance jet stability. Needle diameter also emerges as an influential parameter, ranking highly in global SVM importance, although its SHAP values are near zero in most cases, suggesting that its effect is pronounced only within specific regions of the design space. This observation is consistent with its known influence on the initial jet radius and shear stress at the nozzle, which in turn affect the onset of bending instabilities. In contrast, the tip-to-collector distance exhibits a weak overall importance and modest linear correlation but displays distinct local SHAP effects, indicating that its impact arises under particular combinations of voltage and flow rate. Voltage and rotation speed show secondary yet context-dependent effects, with the latter influencing post-jet drawing and extensional strain during flight. These findings collectively support the strong machine learning, identified importance of solution concentration and needle diameter, as well as the conditional influence of rotation speed, voltage, and flow rate.

To further understand how each processing parameter interacts with solution concentration, the 3D response-surface plots reveal clear and physically coherent trends that align with the SVM importance ranking, SHAP distributions, and correlation analysis (Figure 7).

3D response-surface plots with predicted fibre diameter (nm) as a function of solution concentration and (A) needle diameter, (B) rotation speed, (C) voltage, and (D) flow rate.

The concentration–needle diameter surface shows the sharpest increases in fibre diameter, with needle effects becoming significant only at moderate-to-high concentrations, supporting its high SVM importance and its locally activated SHAP contributions. The concentration–rotation speed surface remains mostly flat at low concentrations but shows modest thinning at higher concentrations, consistent with the parameter’s moderate global importance and its context-dependent SHAP variability. The concentration–voltage surface exhibits smooth, uniform gradients, reflecting voltage’s moderate and steady influence predicted by the SHAP summary and its relatively weak linear correlation. In contrast, the concentration–flow rate surface displays irregular, localised diameter increases at high concentrations, explaining why flow rate ranks low in global importance yet shows wider SHAP dispersion in specific operating regions. Together, these surfaces visually confirm that concentration is the dominant and most robust predictor of fibre diameter, while needle diameter, rotation speed, voltage, and flow rate exert secondary but regime-specific effects, precisely the hierarchy and interaction patterns indicated by the ML and correlation analyses.

From a practical perspective, solution concentration should be tightly controlled, while the remaining parameters should be adjusted with consideration of their interactions.

FibreCastML open web platform

3.3

To broaden access and encourage collaboration in the electrospinning community, a polymer-specific Shiny application (FibreCastML) was developed to predict nanofibre diameters from experimentally controllable process conditions. The interface is intentionally simple. Users first select the polymer and the collector type, then enter the operating parameters: solution concentration, needle diameter (g), rotation speed, voltage (kV), flow rate (mL h^-1^), and tip-to-collector distance (cm). On running a prediction, the app returns not only a point estimate but also the cross-validated distribution of predicted diameters, providing a practical sense of variability. The tool further supports decision-making by highlighting inputs that fall outside the empirical ranges observed for the chosen polymer, proposing solvent combinations (and median ratios when available) drawn from similar historical experiments, and presenting transparent diagnostics, including predicted-versus-observed scatter, metrics (R ^2^, RMSE, MAE), global variable importance, SHAP-based local explanations, and a correlation heat map. The web interface of FibreCastML is shown in Figure 8. The web FibreCastML can be accessed at https://electrospinning.shinyapps.io/electrospinning/.

Screenshots of the web FibreCastML interface. (A) Status tab, (B) Prediction and metrics tab.

All preprocessing and model training are performed within the app, and a complete Excel report, capturing inputs, predictions, metrics, interpretability outputs, and publication-ready figures, can be downloaded for audit and sharing (Excel reports can be found in Supplementary Material). In typical use, if an exact setting is unknown, users may supply a random set up and the out-of-range panel and distribution plots then contextualise the plausibility of those choices. By consolidating data, analytics, and interpretability into a single, user-friendly environment, the application lowers the barrier to applying advanced modelling in electrospinning labs, improves comparability across studies, and accelerates experimental planning while encouraging reproducible, transparent practice.

Experimental validation case

3.4

Statistical analyses were conducted to quantitatively evaluate the agreement between FibreCastML-predicted fibre diameters and the experimentally measured diameters obtained from the TL-01 and Spraybase® electrospinning systems. Prior to selecting appropriate tests, the distribution of fibre diameters for each group (Predicted, TL-01, Spraybase®) was assessed for normality using the Shapiro–Wilk test. All groups demonstrated significant deviations from a normal distribution (TL-01: p = 0.0015; Spraybase®: p = 0.00235; Predicted: p = 3.36 × 10^−9^), confirming the need for non-parametric statistical methods.

Descriptive statistics were first computed to characterise the central tendency and dispersion of each dataset. TL-01 fibres exhibited a median diameter of 148.5 ± 30.24 nm, Spraybase® fibres showed a slightly lower median of 137.75 ± 28.76 nm, and FibreCastML prediction obtained with the same process parameters was 137.651 nm. The overall ranges were comparable between electrospinning systems and the single fibre diameter prediction.

Figure 9 shows the location of the predicted fibre diameter within the distribution of predictions from cross-validation obtained from FibreCastML. Figure 10 shows examples of SEM images and total fibre diameter distributions of the scaffolds created with TL-01 and Spraybase®.

Predicted fibre diameter obtained from the web FibreCastML interface.

SEM images and total fibre diameter distributions created with (A,C) TL-01, (B,D) Spraybase®.

The comparison between the fibre diameter distribution obtained with the scaffolds produced with Spraybase® electrospinner and the predicted diameter distribution using the residual bootstrap reveals substantial agreement in overall shape, central tendency, and general dispersion of both populations. From an inferential perspective, the three statistical tests applied (Kolmogorov–Smirnov (KS), Mann–Whitney U, and independent-samples t-test) consistently failed to detect significant differences between the real and simulated distributions (KS: p = 0.13; UMW: p = 0.32; t-test: p = 0.34). These results indicate that the two samples are statistically compatible in terms of distribution, median, and mean. Complementary metrics reinforce this conclusion: the Kullback–Leibler divergence (KL) was 0.1497, the Wasserstein distance was 15.73, and the overlap coefficient (OVL) reached 84.11%, all pointing to a high degree of similarity between the two distributions. Overlaid density plots show that both distributions share a common modal region around 120–140 μm, where the highest probability density largely coincides (Figure 11). The model accurately reproduces the core structure of the process, capturing the prominent concentration of diameters in this range. Although the simulated distribution exhibits slightly heavier tails (particularly towards larger diameters) most of the probability mass remains within the experimentally observed bounds.

Fibre diameter distributions from real scaffolds created with Spraybase® (Real) and predicted distribution (Simulated).

Finally, violin plots confirm that the median, interquartile range, and skewness of the simulated distribution closely approximate those of the real distribution (Figure 12). The additional variability observed at the extremes of the simulated violin is consistent with the deliberate incorporation of real-world variability through the Residual Bootstrapping procedure, designed to preserve the inherent heterogeneity of electrospinning and avoid an artificially overoptimistic fit.

Violin comparison between real values obtained with Spraybase® (Real) and predicted values (Simulated).

The comparison between the fibre distribution obtained with the scaffolds produced with TL-01 and the predicted revealed a moderate-to-high level of global concordance between the real and simulated fibrillar diameter distributions, as indicated by an overlap coefficient (OVL) of 73.97%. This suggests that approximately three-quarters of the simulated distribution coincides with the experimental one. Shape and distance metrics further support this interpretation: the Kullback–Leibler divergence (KL) was 0.261, reflecting a low divergence, while the Wasserstein distance was 16.67, indicating a moderate separation between distributions. However, hypothesis tests consistently rejected equality between distributions: Kolmogorov–Smirnov (p = 1.4 × 10^−5^), Mann–Whitney U (p = 0.00127), and t-test (p = 0.0224), indicating significant differences in shape, median, and mean. Visual analyses corroborate these findings: density plots reveal that the simulated distribution is narrower with a sharper peak, while the real distribution exhibits heavier tails; and violin plots show greater spread in the real data and a slight bias towards smaller diameters in the simulated set (Figure 13). Overall, FiberCastML captures the central tendency and approximately 74% of the real distribution’s structure, yet notable discrepancies persist in form and dispersion, differences attributable to uncontrolled physical factors in electrospinning (e.g., humidity, temperature, microflow variations, jet instability) that the model cannot fully reconstruct. Nevertheless, the system provides realistic simulations for diameter prediction, preserving the essential statistical features of the process.

(A) Density and (B) violin plot for real values obtained with LT-01 (Real) and predicted values (Simulated).

In summary, external validation using two independent datasets (Spraybase® and TL-01, N = 124 each) demonstrates that FiberCastML effectively reproduces the statistical structure of electrospun PVA fibre diameters under diverse experimental conditions. The model consistently captures central tendencies and key distributional features, providing strong predictive validity despite the inherent stochasticity of the electrospinning process.

Limitations of the study

3.5

Recent studies reported that tree- or rule-based algorithms such as GBM and Cubist deliver the lowest prediction errors and the highest coefficients of determination, owing to their capacity to model non-linear relationships, exploit ensemble strategies, and maintain robustness against overfitting (Roldán et al., 2024b; Roldán Ciudad et al., 2025). In our study, these models achieved marginally higher R ^2^ values and lower error metrics than alternative approaches; however, they could not be deployed reliably in the server-side Shiny environment due to version and platform inconsistencies between development and production (R and package binaries). For transparency and reproducibility, they were therefore neither included in the code nor were their results reported.

Occasionally, it is necessary to reload the page to establish a connection to the Shiny back end. This behaviour reflects conditions in the hosting provider’s infrastructure that lie outside our control.

The experimental validation study has several limitations that should be acknowledged. First, although the external validation dataset encompassed multiple electrospinning systems and imaging modalities, all experiments and image acquisitions were performed by a single researcher. As a result, potential researcher-to- researcher variability in sample preparation and imaging was not assessed. Secondly, the dataset, while deliberately diverse, remains relatively limited in size compared with the range of conditions that may be encountered across different laboratories. Additional validation with a larger, multicentric dataset involving independent users would further strengthen the evidence for the FibreCastML’s generalisability. Finally, only PVA-based scaffolds were evaluated in this study; therefore, the model’s performance on other polymer systems remains to be established.

Only routinely reported and consistently available variables were retained for analysis, including solution concentration, needle diameter, rotation speed, voltage, flow rate, and tip-to-collector distance. Parameters such as viscosity, electrical conductivity, surface tension, molecular weight, humidity, material’s grade and brands, deposition time and material of the collector were excluded due to their limited reporting (currently appearing in fewer than 12% of studies included in our database) which hindered meaningful cross-study generalisation and introduced substantial data missingness. However, incorporating physicochemical and rheological parameters can provide valuable insight into the governing electrohydrodynamic mechanisms (Sukpancharoen et al., 2024). Therefore, future work will aim to include these predictors, along with polymer combinations, to enhance model robustness and improve the understanding of factors influencing fibre formation.

Conclusion

4

This study advances electrospinning optimisation from mean-focused to distribution-aware prediction, establishing a world-first open framework (FibreCastML) capable of forecasting full fibre-diameter spectra from standard experimental parameters. Trained on a uniquely curated database of 68,538 measurements across 16 polymers, the framework combines rigorous nested cross-validation and interpretability analysis to deliver reliable, transparent, and generalisable insights.

Across polymers, non-linear and local learners consistently outperform linear baselines, confirming the importance of nonlinearity in electrospinning behaviour. Solution concentration remains the dominant factor, providing a robust lever for design control. In the experimental validation case, FibreCastML accurately reproduced the measured PVA diameter distribution (Kolmogorov–Smirnov p > 0.13 and overlap coefficient of 84%), allowing to cut significantly experimental iterations and solvent consumption, demonstrating tangible laboratory and environmental impact.

Operationally, the app transforms complex ML workflows into accessible tools for practitioners by offering: (i) Polymer-specific distribution forecasts and diagnostics (R ^2^, RMSE, MAE); (ii) Interpretability outputs (variable importance, SHAP) linking parameters to outcomes; (iii) Automated error checking and solvent recommendations from historical conditions; and (iv) Downloadable, auditable reports to support reproducibility and cross-lab comparability.

These features position FibreCastML as a decision engine for data-driven, sustainable electrospinning. Future versions will integrate curated physicochemical and rheological features to further enhance generalisability and extend to polymer blends and multi-objective optimisation.

In summary, distribution-aware machine learning transforms electrospinning into a predictive, sustainable, and globally shareable process, reducing waste and accelerating the design of biomimetic scaffolds for tissue engineering, wound dressings, and drug delivery. This framework redefines experimental strategy in electrospinning, marking a paradigm shift toward reproducible and environmentally responsible nanomanufacturing.

Bibliography59

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ahmadi S. Shafiei S. S. Sabouni F. (2022). Electrospun nanofibrous scaffolds of polycaprolactone/gelatin reinforced with layered double hydroxide nanoclay for nerve tissue engineering applications. ACS Omega 7, 28351–28360. 10.1021/acsomega.2c 02863 35990483 PMC 9386844 · doi ↗ · pubmed ↗
2Baek J. Chen X. Sovani S. Jin S. Grogan S. P. D’Lima D. D. (2015). Meniscus tissue engineering using a novel combination of electrospun scaffolds and human meniscus cells embedded within an extracellular matrix hydrogel. J. Orthop. Res. 33, 572–583. 10.1002/jor.22802 25640671 PMC 4386835 · doi ↗ · pubmed ↗
3Bagherzadeh R. Latifi M. Najar S. S. Kong L. (2014). Experimental verification of theoretical prediction of fiber to fiber contacts in electrospun multilayer nano-microfibrous assemblies: effect of fiber diameter and network porosity. J. Industrial Text. 43, 483–495. 10.1177/1528083712463400 · doi ↗
4Baker S. C. Atkin N. Gunning P. A. Granville N. Wilson K. Wilson D. (2006). Characterisation of electrospun polystyrene scaffolds for three-dimensional in vitro biological studies. Biomaterials 27, 3136–3146. 10.1016/j.biomaterials.2006.01.026 16473404 · doi ↗ · pubmed ↗
5Beisbayeva Z. Zhanbassynova A. Kulzhanova G. Mukasheva F. Erisken C. (2021). Change in collagen fibril diameter distribution of bovine anterior cruciate ligament upon injury can be mimicked in a nanostructured scaffold. Molecules 26, 1204. 10.3390/molecules 26051204 33668154 PMC 7956598 · doi ↗ · pubmed ↗
6Bi H. Feng T. Li B. Han Y. (2020). In vitro and in vivo comparison study of electrospun PLA and PLA/PVA/SA fiber membranes for wound healing. Polymers 12, 839. 10.3390/polym 12040839 32268612 PMC 7240532 · doi ↗ · pubmed ↗
7Breiman L. (2001). Random forests. Mach. Learn. 45, 5–32. 10.1023/A:1010933404324 · doi ↗
8Cao H. Jiang X. Chai C. Chew S. Y. (2010). RNA interference by nanofiber-based si RNA delivery system. J. Control. Release 144, 203–212. 10.1016/j.jconrel.2010.02.003 20138939 · doi ↗ · pubmed ↗