Automated active learning to optimize hydrogel drug release profiles
Eugene Cheong, D. Christopher Radford, Adam J. Gormley

TL;DR
This paper introduces an automated machine learning system to optimize hydrogel drug release, reducing the need for extensive experiments.
Contribution
The novel contribution is an ML-guided framework combining automation and Bayesian optimization for efficient hydrogel formulation.
Findings
A Gaussian process model predicted drug release profiles using features like time and alginate properties.
Bayesian optimization achieved near-zero-order release of BSA and chABC-SENs with minimal iterations.
The system successfully translated optimized BSA formulations to chondroitinase ABC delivery without further adjustments.
Abstract
Hydrogels are widely used in drug delivery due to their biocompatibility and tunable release properties. However, optimizing hydrogel formulations to the desired release of therapeutics remains experimentally intensive. In this study, we developed an automated, high-throughput and machine learning (ML)-guided framework to efficiently optimize alginate formulations for drug delivery. Using a liquid handling robot, we initially prepared a diverse seed library of 120 alginate hydrogel formulations loaded with bovine serum albumin (BSA) and measured their release profiles. A Gaussian process regression (GPR) ML model was trained to predict cumulative release across time, enabling implicit modeling of release curves. Feature importance analysis using Shapley additive explanations (SHAP) identified time, alginate molecular weight, and concentration as dominant factors influencing release…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHydrogels: synthesis, properties, applications · 3D Printing in Biomedical Research · Advanced Drug Delivery Systems
Introduction
Hydrogels are a class of highly hydrated polymer networks widely studied for their potential as drug delivery systems across diverse medical applications, including oncology, tissue engineering, and regenerative medicine [1,2]. Their high-water content, tunable mechanical properties, and biocompatibility enable the encapsulation and delivery of a wide variety of therapeutics, ranging from small molecules to proteins and nucleic acids. Hydrogels can be engineered from natural, synthetic, or semi-synthetic polymers and are capable of being delivered through various routes, including local implantation and transdermal administration [1]. Critically, their internal mesh size, polymer–drug interactions, and degradation or swelling behaviors govern the kinetics of drug release [3]. While hydrogels offer significant promise for achieving sustained or controlled release, they often suffer from burst release, characterized by a large fraction of the drug being released rapidly upon administration [4]. Therefore, optimizing hydrogel formulations to fine-tune release profiles remains a central challenge in the development of effective hydrogel-based delivery systems [5,6].
Alginate hydrogels are widely used in drug delivery due to their biocompatibility, mild gelation conditions, and tunable physicochemical properties [7,8]. Derived from brown seaweed, alginate is a linear copolymer composed of mannuronic (M) and guluronic (G) acid residues, which form hydrogels through ionic crosslinking with divalent cations such as calcium [8,9]. These hydrogels enable controlled drug release by acting as diffusion barriers and, in some cases, through degradation or swelling-mediated mechanisms [10,11]. The release kinetics of encapsulated therapeutics are strongly influenced by the formulation parameters, including alginate concentration, molecular weight, and type as well as the concentration of crosslinking ions [11–13]. Higher alginate concentrations typically result in denser, more stable gels with slower drug diffusion, while variations in crosslinker type can modulate gel porosity and stability, thereby affecting both the rate and profile of drug release [12,14–17]. By tuning these formulation parameters, alginate hydrogels can be engineered to achieve sustained, stimuli-responsive, or even zero-order release profiles, making them versatile platforms for a broad range of drug delivery applications [18,19]. These factors make alginate hydrogels the ideal model platform for developing hydrogel optimization strategies.
The robust functionality of liquid handling robots allows us to reproducibly generate complex hydrogel libraries with little to no variability between replicates. This is paramount when analyzing the often-subtle trends in drug release kinetics. Additionally, the use of automation for hydrogel synthesis improves the overall throughput of what is an already labor-intensive process. This improved throughput facilitates the generation of large, multi-dimensional datasets crucial for downstream drug release optimizations [20,21].
This availability of large datasets of high dimensionality requires the use of machine learning (ML) [22] for modeling drug release kinetics and to guide formulation optimization. Models such as Gaussian process regressors (GPRs) [23] can be trained on the release of drug at each timepoint and in turn predict the drug release kinetics of untested formulations. Moreover, active learning principals can be leveraged to ensure the efficient exploration of the parameter space and guide formulation optimizations to reach the desired release profiles including near zero-order release [24].
Recent advancements in ML in the drug delivery space has enabled the precise modeling of complex formulation spaces and has accelerated the discovery of novel drug delivery systems. Supervised learning models such as artificial neural networks, support vector machines, and gradient boosting have been used to predict key parameters such as drug loading efficiency, and release kinetics from input parameters such as material composition, synthesis conditions, and molecular descriptors. These advancements are summarized in a recent review published review article [25]. These advancements collectively highlight the role of ML not only as a predictive tool but also as a means of extracting mechanistic insight from high-dimensional, experimentally intractable formulation spaces.
Using automation and ML, an experimental, data processing, and optimization pipeline can be developed. One use case for this strategy is to optimize protein payload release from an alginate hydrogel. An inexpensive protein such as bovine serum albumin (BSA) was used to perform initial optimizations using this data-driven approach. The findings and final hydrogel formulation can then be applied to more sensitive and precious payloads such as enzymes, thus reducing the waste and usage of valuable materials.
This experimental approach will be validated by the sustained release of active chondroitinase ABC (chABC) [26]. Derived from Proteus vulgaris, chABC is studied for its potential use in treating spinal cord injuries as it breaks down chondroitin sulfate proteoglycans (CSPGs), the main component of the glial scar [27–29]. By breaking down CSPGs, chABC helps reduce inhibitory signals at the injury site, promoting axonal regeneration and functional recovery [30,31]. Previous work on chABC aimed to protect and retain the activity of the otherwise thermally unstable enzyme [32–34]. One promising method of stabilization is the use of a short co-polymer chain developed my Kosuri et al. where the polymer is non-covalently associated with chABC, thus forming a chABC-single enzyme nanoparticle (chABC-SEN) [24]. For the efficacious treatment of spinal cord injuries, chABC-SENs must be delivered directly to the site of injury at a sustained rate to ensure continual breakdown of the glial scar [35]. Alginate hydrogels are a potential delivery vehicle for chABC-SENs as it can ensure a localized delivery [19,36].
In this study, we aim to optimize the release of a protein payload to exhibit a near zero-order sustained release from an alginate hydrogel. Along with the use of automated platforms to synthesize alginate hydrogels, we will leverage ML models as well as active learning principles to efficiently navigate the design space. With a sufficiently trained ML model, alginate hydrogel formulations that showcase the highest predicted performance was tested and applied for the release of chABC-SENs. This will be achieved via a design-build-test-learn pipeline where initial hydrogel formulations are tested to obtain release kinetics (Fig. 1). This data is then used to train an ML model to predict the release of untested formulations. Using active learning, new alginate formulations are then proposed and subsequently tested. This process will be repeated for multiple rounds before the best performing formulations are used to release chABC-SENs.
Materials and methods
Materials
2.1.
PRONOVA UltraPure alginates (MVG, LVG, and VLVG) were purchased from NovaMatrix. Pierce BCA Assay was purchased from Thermo Scientific. Calcium chloride, calcium sulfate, strontium chloride, and barium chloride were purchased from Sigma-Aldrich. BSA was purchased from Sigma-Aldrich. Phosphate buffered saline (PBS) was purchased from Thermo Scientific. Chondroitinase ABC was purchased from R&D Systems. Monomers 2-diethyl amino ethyl methacrylate, butyl methacrylate, poly(ethylene glycol) methacrylate, and [2-(meth-acryloyloxy)ethyl]trimethylammonium were purchased from Sigma-Aldrich.
Automated preparation of alginate hydrogels
2.2.
Synthesis of alginate gels was performed using a Hamilton Microlab Vantage Liquid handling system [20,21]. Alginate aliquots were made by diluting alginate powders in PBS at different concentrations (1, 2, 3, 4 wt%) and left to dissolve overnight on a rocker. Crosslinker concentrations were prepared on the day of gel synthesis and dissolved in ultrapure water (0.05, 0.10, 0.15, 0.20, 0.25 M). BSA (1 %w/v) was prepared in PBS as the payload for each alginate gel. To perform automated synthesis using the liquid handling system, custom Python software (PolyCraft) was used to convert formulations represented in an excel spreadsheet file into a series of liquid handling steps that were then implemented by the Hamilton system [20]. This software also allows for liquid class tuning which is paramount for the consistent transfer of viscous alginate solutions. Alginate gels were created by adding 100 μL alginate, 50 μL BSA payload and 50 μL crosslinker to a polystyrene 96-well plate. The alginate solutions were pre-mixed with the BSA solution at a ratio of 2:1 and aspirated at a rate of 2 μL/min with a wide bore tip (300 μL, 1.55 mm CO-RE II tips). Crosslinker solutions were added at a rate of 50 μL/min using standard 300 μL CO-RE II tips. After crosslinker addition, the alginate gels were manually mixed to ensure a homogenous solution during the gelation process. This was extremely important for hydrogel formulations with a high alginate molecular weight and concentration due to the viscosity of the solution. The alginate gels were then cured at 37 °C overnight before being assessed for successful gelation and conducting release studies. Compression tests were conducted on select alginate gel formulations to ensure mechanical differences between formulations (Supplementary Fig. S1).
Quantifying BSA release kinetics
2.3.
Cured alginate gels were washed with 100 μL of PBS followed by supernatant collection at t = 0 h. 100 μL of fresh PBS was added to the well plate before sealing and placing the gel in a 37 °C incubator. After 30 min, PBS supernatant was collected from each gel and replaced with fresh PBS. This was repeated for each subsequent timepoint (1, 2, 3, 4, 8, 24, 48, 72 h). To quantify the release of BSA from alginate hydrogels, a BCA assay was performed on the collected supernatant. In brief, 25 μL of supernatant was added to 200 μL of BCA assay reagent and incubated for 30 min at 37 °C. The well plate was analyzed by measuring the absorbance at 526 nm. The amount of BSA released was quantified by comparing the absorbance value to a BSA protein standard. The supernatant collected at t = 0 h was used to determine the total amount of BSA encapsulated. The payload release kinetics were determined by the amount of BSA released at each timepoint divided by the total BSA encapsulated and represented as a percentage of cumulative BSA released.
Release data modeling
2.4.
To optimize for near zero-order release of BSA from alginate hydrogels, the linearity of the release curve was quantified using the Korsmeyer-Peppas model [37] defined as:
Where is the cumulative release of BSA at each timepoint (t). K_m_ represents the rate constant, and n represents the diffusion constant. The rate constant K_m_ can be used to determine linearity of the BSA release curve with a value of 1 being perfectly zero-order release with increasing k value indicating greater burst release-like character. In this study, the gel to supernatant ratio in the 96-well format is limited, therefore true sink conditions are not achieved. As a result, the assumptions underlying the Korsmeyer–Peppas model are not fully satisfied. Thus, the use of the Korsmeyer-Peppas model is applied solely for comparative kinetic analysis between formulations and does not serve as an indication of the true model for physiologic release.
Machine learning model development and evaluation
2.5.
Construction of design space and ML model
2.5.1.
All ML methods were implemented in Python 3.10 using the scikit-learn package (v1.7.0). Prior to ML model training, the design space was featurized to better capture key relational aspects within parameters, with the goal of improving model performance. Specifically, alginate selection (MVG, LVG, and VLVG) was represented by an ordinal ranking of alginate molecular weight. The alginate molecular weights are defined as VLVG (< 75 kDa), LVG (75–200 kDa) and MVG (> 200 kDa). All alginates used for this study consists of a G/M ratio of >1.5. The choice of crosslinker (calcium chloride, calcium sulfate, barium chloride, strontium chloride) was represented by a combination of two variables (cation size and anion valence). Multiple ML models were tested to determine the most appropriate model for this dataset (Supplementary Table S1). Ultimately, a GPR model was determined to be the most appropriate for this dataset. The GPR model was then trained to predict BSA release kinetics from these design features. Specifically, a composite kernel was used consisting of constant kernel multiplied by a radial basis function (RBF) kernel and added to a white noise kernel. This kernel is represented mathematically by:
Where x is the feature vector of the formulation, C0, ℓ, and σ are kernel hyperparameters, and δ is a Kronecker delta function. Collectively, this allowed the GPR to model a smooth function with appropriate scaling and the ability to account for inherent experimental noise. Furthermore, each prediction of the GPR takes the form of a posterior distribution with a mathematically defined mean and variance; as such, model uncertainty can be quantified and used to guide Bayesian optimization campaigns. Prior to model training, all features and the target variable (BSA release) were independently scaled using the StandardScaler function.
Implicit modeling
2.5.2.
Given the limited quantity and high complexity of the data collected, an implicit modeling approach was utilized. Here, we define an explicit modeling approach as one that inputs the formulation parameters and use the k value (from the Korsmeyer-Peppas modeling) as the prediction output. Meanwhile, implicit modeling predicts cumulative release for each unique formulation at each timepoint. With the addition of time as an input feature, the GPR model was trained on the entire release curve as a function of formulation. By adopting this strategy, we can preserve the full resolution of the release profile over time as opposed to a single value. This allows for the retroactive calculation of k values by fitting the predicted release curves to the Korsmeyer-Peppas model. Including time as an input parameter is therefore essential to learning the dynamic behavior of each formulation. Without time as a parameter, the model cannot differentiate between early and late release stages or capture non-linear kinetics. Therefore, this implicit approach significantly improved model fidelity.
Model validation
2.5.3.
GPR hyperparameters described above were tuned using group K fold cross-validation with five folds and an 80:20 train-test split. While splitting the dataset into folds, data was grouped by formulation ID to align the model objective with the intended task (prediction of full release kinetics for a given formulation) and prevent data leakage. Model performance was then assessed via coefficient of determination (R^2^ value) between predicted and measured release across all formulation/timepoint pairs. Leave-one-out cross validation was also explored as a possible model validation method (Supplementary Fig. S2) but was later determined that group k fold cross-validation was more compatible with the implicit modeling approach. Hyperparameters for the GPR were established by training on the seed library data and then fixed for the remainder of the active learning campaign. Model performance after completion of the campaign was assessed by an analogous approach using the full dataset collected.
The input features were assessed using Shapely Additive exPlanations (SHAP) [38]. SHAP is a model-agnostic interpretability method that assigns each feature an importance value based on its marginal contribution to the prediction. Therefore, this analysis highlights the individual impacts of each feature on the final predicted release.
Active learning
2.6.
Iterative improvement and optimization of alginate hydrogels was guided by active learning. Using the data generated from the seed library, the GPR model was then used to predict the release kinetics of every possible formulation within the parameter space. This provided a predicted k value for each formulation (predictive performance) as well as an uncertainty value for each prediction (prediction uncertainty). To select the next generation of formulations to experimentally test, we employed the expected improvement (EI) function, a common strategy in Bayesian optimization. EI quantifies the probability and magnitude of improvement over the current best-performing formulation by balancing exploration (formulations with high predictive uncertainty) and exploitation (formulations with low predicted k values). As such, a 70:30 explore-exploit ratio was employed where 70 % of the tested formulations showed a high uncertainty value and 30 % of the tested formulations showed a high predicted performance value. By biasing the acquisition function to a more exploratory regimen, the GPR model is exposed to more data in regions of high uncertainty and produce more accurate predictions for subsequent testing. The second generation of alginate formulations tested biased entirely on exploit where 100 % of the formulations tested had high predicted performance and low predictive uncertainty. This generation of formulations aims to produce BSA release curves that exhibit a near zero-order release.
Synthesis of chABC-SEN
2.7.
ChABC-SENs were synthesized following a previously described protocol [24]. In brief, a copolymer consisting of 2-diethyl amino ethyl methacrylate (23.1 mol%), butyl methacrylate (33.2 mol%), poly (ethylene glycol) methacrylate (31.7 mol%), and [2-(methacryloyloxy) ethyl]trimethylammonium (12.0 mol%) was synthesized using photo-induced electron/energy transfer reversible addition–fragmentation chain transfer (PET-RAFT) polymerization to a degree of polymerization of 75. The resultant polymer was purified via dialysis and lyophilized to achieve a dry powder. The synthesized polymer was characterized via gel permeation chromatography and was determined to have a molecular weight of approximately 38 kDa and a Ɖ of 1.3 (Supplementary Fig. S3). Immediately before use, the purified polymer was solubilized at a concentration of 31.5 mg/mL and added in equal volume to a solution of chABC at 45.3 ng/μL and mixed thoroughly in an ice bath. chABC-SEN solutions were always made fresh immediately prior to alginate hydrogel synthesis.
Release studies for chABC-SENs
2.8.
Top performing alginate formulations (formulation numbers 141 and 137) were prepared and loaded with 50 μL of chABC-SENs (1.13 μg chABC per gel). After the hydrogels are cured, 100 μL PBS supernatant is added. After 1 h, supernatant is collected and replaced with fresh PBS. Immediately after collection, a chABC activity assay is performed on the supernatant [24]. In brief, 50 μL of supernatant was added to an equal volume of chondroitin sulfate substrate solution (4 mg/mL) in a UV compatible 384-well plate. After brief mixing, the absorbance at 232 nm was measured at intervals of 10 s for 45 min. The initial rate of absorbance change was calculated and compared to a standard curve generated from known chABC concentrations to determine the amount of active enzyme released at each timepoint. This process was repeated for all timepoints (1, 2, 3, 4, 8, 24, 48 h).
Results
Experimental testing on initial seed library
3.1.
The ML model was trained on an initial seed library of 120 unique alginate formulations (with each formulation tested in triplicate) defined by alginate concentration (%), alginate molecular weight (VLVG, LVG, MVG), crosslinker concentration (M) and crosslinker type (calcium chloride, calcium sulfate, barium chloride, strontium chloride) (Supplementary Table S3). As stated earlier, alginate molecular weights are as follows: VLVG (< 75 kDa), LVG (75–200 kDa) and MVG (> 200 kDa). All alginates used for this study consists of a G/M ratio of >1.5. The formulations of the seed library were generated using Latin hypercube sampling (LHS) to ensure a representative sampling of the entire parameter space.
A release study was performed on each formulation in the seed library and the data was used in ML model training. With BSA as the model payload, protein quantification via BCA assay was used to determine the amount of released protein at each timepoint. The linearity of the release curve was quantified by applying the Korsmeyer-Peppas model. In this model, the constant k denotes the linearity of the release with a value of 1 being perfectly linear and sustained release. The seed library tested showed a wide range of k values (Fig. 2A) which is indicative of a diverse array of release profiles (Fig. 2B). The magnitude of the k value for each of the listed formulations can be directly correlated to the linearity of the release profile, hence validating the approach of optimizing for a low k value when looking for a near zero-order release profile. In this study, a near zero-order release is defined as having a k-value of between 0 and 10. Overall, the seed library produced a diverse dataset of release profiles, all of which were fitted using Korsmeyer-Peppas to determine a representative k value used to quantitatively assess release behavior (Fig. 2C). The diverse release profiles obtained from the initial seed library provides key variability within both the design parameter space and the target objective space. Together this provides the necessary foundational dataset for an ML model to determine how the former influences the later.
Model performance and validation
3.2.
The release data obtained from the seed library was used to train a GPR model for predicting BSA release kinetics. To better represent the design space of hydrogel formulations, the input parameters were refeaturized (Supplementary Table S2) to represent underlying properties more effectively. Model performance was evaluated using group K fold cross-validation with five folds and each fold utilizing an 80:20 train-test split based on formulation ID. This approach ensures that the model is not exposed to any timepoints from a formulation in the training set that appear in the test set, thus offering a clearer test of the model’s predictive performance on entirely novel formulations.
Following a group K-fold validation strategy, the GPR model demonstrated strong predictive performance, achieving an R^2^ value of 0.71 between measured and predicted cumulative BSA release values across the entire seed library dataset (Fig. 3A). Complementary residual analysis showed randomly distributed errors with no systematic structure (Supplementary Fig. S4), supporting the robustness and generalizability of the model. This indicates that the model successfully captured key trends and relationships for protein release from alginate hydrogels.
To interrogate the relative influence of formulation parameters on model predictions, SHAP analysis was conducted (Fig. 3B). This provides quantitative contribution of individual design features on model output. Time was unsurprisingly the most influential feature, as cumulative release is inherently a time-dependent process. Beyond time, alginate molecular weight and alginate concentration were the most impactful, with higher molecular weight formulations associated with a more sustained release.
Crosslinker concentration exhibited a moderate effect, while crosslinker type (refeaturized into cation size and anion valence) contributed minimally to model predictions. These insights were further quantified using normalized mean absolute SHAP values (Supplementary Fig. S5), revealing the relative strength of each feature’s contribution to model output. Collectively, the SHAP analysis offers an interpretable framework for understanding how individual formulation variables modulate release kinetics, guiding future optimization strategies.
Release optimization by active learning
3.3.
Following model training and validation using the initial seed library, active learning was employed to iteratively improve hydrogel formulations for sustained BSA release. The goal was to identify formulations that minimized the k value derived from Korsmeyer–Peppas modeling. Formulation candidates were prioritized based on two criteria: the predicted k value and associated model uncertainty generated by the GPR model.
Generation 1 represented an exploratory phase, wherein 20 new formulations were selected using a 70:30 explore-to-exploit ratio. This is made possible with the use of a GPR model where an uncertainty value is generated alongside the predicted release. This allows for a more comprehensive view of the model predictions and better informs the selection of the next formulations to test. Most formulations were chosen based on high prediction uncertainty to enrich the training set and improve release predictions, while a smaller portion targeted low predicted k values to guide optimization. The data obtained from Generation 1 was added to the seed library to create a new expanded dataset for the next round of active learning. The updated training set significantly improved model confidence, as evidenced by a marked reduction in average prediction uncertainty across the entire design space (Supplementary Fig. S6). To leverage this improved confidence, Generation 2 adopted a fully exploitative strategy, selecting 10 formulations predicted to yield the lowest k values. When these formulations were synthesized and release kinetics evaluated, a clear improvement in k values was observed relative to previous generations. As such, the results of this active learning campaign demonstrate a clear downward trend in k values across successive generations (Fig. 4A).
In Generation 1, 70 % of the formulations were selected based on high model uncertainty, while the remaining 30 % were selected for low k. This balanced explore/exploit approach led to a relatively wide distribution of measured k values (min = 7.13, max = 45.5, mean = 20.7) (Fig. 4B) as most of the formulations tested were of high uncertainty. Generation 2 was designed with a purely exploitative strategy, selecting only formulations predicted to yield low measured k values. This resulted in a substantial narrowing in the k distribution (min = 5.90, max = 29.5, mean = 14.9), as shown in both the violin plot (Fig. 4A) and summary statistics (Fig. 4B). The decreasing average and maximum k values across generations demonstrate the efficacy of the active learning strategy in driving formulation improvements toward more sustained release profiles.
Overall, the active learning framework enabled rapid convergence toward optimized hydrogel formulations with significantly lower k values, reducing the need for exhaustive screening and accelerating the identification of high-performance systems for protein delivery.
Experimental validation using a therapeutic protein
3.4.
To evaluate whether the optimized alginate hydrogel formulations identified using BSA could be extended to more complex therapeutic proteins. Through ML-guided active learning, we identified a subset of high-performing formulations that supported sustained BSA release with low k values, indicative of near zero-order kinetics. We leveraged the top-performing BSA-optimized formulations that were experimentally tested for chABC-SEN release. Despite having a different payload, the selected formulations were able to release chABC-SENs at a near zeroorder release, producing desired release kinetics without the need to for an extensive optimization campaign (Supplementary Fig. S7). This was made possible due to the similarities between BSA and chABC (Supplementary Fig. S8) in terms of overall size in PBS. Although other properties such as charge may result in small differences in release, the ML driven optimization pipeline was still successful in capturing the alginate gel characteristics needed for a sustained, near zero-order release.
The release of active chABC was assessed using an enzymatic activity assay at multiple timepoints. Both tested formulations achieved sustained, near-zero-order release of the therapeutic enzyme, with low k values of 7.45 and 3.90 respectively (Fig. 5).
Discussion
This work showcases the use of ML and active learning to develop a data-driven approach to drug release from hydrogels. By leveraging the high-throughput capabilities of automation, we demonstrated an efficient iterative approach to achieving a near zero-order release of BSA payload from alginate hydrogels. A key outcome of this work is the ability to generalize formulation-performance relationships identified using BSA to a more complex payload, chABC-SENs. Importantly, the strong correlation between BSA-optimized formulations and their performance with chABC-SENs validates this approach to hydrogel optimization. This supports the use of appropriate proxy systems in hydrogel development to mitigate cost and material limitations associated with biologics. However, this strategy may not always be transferable across all protein therapeutics, as differences in protein-hydrogel affinity can lead to vastly different release behaviors. Therefore, careful selection of analog proteins is critical to ensure relevance and predictive accuracy in formulation development. Additionally, the experimental pipeline developed in this study was designed with a strong emphasis on automation compatibility. While this approach enables high-throughput fabrication and screening, it also necessitated certain tradeoffs in experimental design, including the inability to maintain strict sink conditions. Future work will focus on building upon this automationcompatible framework to further optimize experimental design. These improvements include the incorporation of deep-well plates to accommodate larger supernatant volumes and the implementation of automated supernatant exchange to enable more frequent sampling and improved kinetic resolution. Moreover, the experimental setup using the 96-well alginate/PBS system may not accurately represent a true physiological drug delivery platform. This is because the use of PBS as the supernatant could lead to altered erosion behavior in alginate gels [39] and therefore will not accurately model true in vivo release. Despite the limitations, this experimental setup serves as a testbed to showcase feasibility of an automated fabrication to ML guided optimization pipeline. Future studies should utilize more physiologically relevant media such as HEPES or a simulated body fluid to better capture the in vivo environment for a greater relevance to translational drug delivery.
The ML model revealed several notable trends in formulation behavior. This was shown in the ability of the GPR to accurately predict the release of BSA for formulations it was blind to (with a R^2^ of 0.71). SHAP analysis identified alginate molecular weight and concentration as the most influential factors in hydrogel formulation. Conversely, cation size and anion valence—used to encode crosslinker type—showed limited influence. Though the limited influence could be a result of being overshadowed by the alginate concentration and molecular weight within the machine learning model. These findings align with the established understanding of alginate gel mechanics, where higher molecular weight alginates form more entangled, denser polymer networks that reduce mesh size and diffusivity, resulting in slower and more sustained protein release. In contrast, low molecular weight alginates form looser, more porous gels that permit faster diffusion and often exhibit burst release profiles. The ability of the GPR to identify these well-supported relationships demonstrates that it effectively learned meaningful drivers of protein release. This not only supports the GPR's predictive validity but also highlights the potential of data-driven approaches to uncover and reinforce mechanistic insights in biomaterials design.
The iterative application of active learning enabled rapid refinement of formulation space, with each generation yielding increasingly lower k values. This highlights the potential of ML to accelerate biomaterials discovery, particularly in systems with high experimental complexity or cost. Notably, this strategy achieved desirable release profiles without exhaustive screening of the full formulation space, reducing the resource burden typically associated with traditional optimization methods.
Through active learning, we iteratively refined formulation selection to reduce experimental burden and accelerate convergence toward optimal release profiles. Successive generations of model-informed formulations exhibited systematically lower k values, demonstrating effective control over release kinetics using a minimal number of experiments. Importantly, the optimized formulations derived from inexpensive, tractable BSA studies were directly transferable to chABC-SENs, achieving near-zero-order release with minimal additional optimization.
Taken together, these findings establish a scalable and adaptable pipeline for optimizing hydrogel formulations and highlight the role of ML in driving materials innovation. By leveraging surrogate proteins and active learning, this method offers a cost-effective pathway toward controlled release systems for fragile or valuable therapeutics.
Conclusion
In this study, we demonstrated a ML-guided framework for the rational design and optimization of alginate hydrogel formulations for sustained protein release. By leveraging an initial seed library of 120 diverse formulations, we trained a GPR model capable of accurately predicting BSA release profiles based on key formulation parameters. Feature importance analysis using SHAP revealed time, alginate molecular weight, and concentration as dominant drivers of release behavior.
Future iterations of this study could aim to integrate multi-objective optimization where the release kinetics are optimized in conjunction with other factors such as total release of payload, physical properties of the hydrogel and in vitro toxicity; all of which play a role in determining in vivo viability. Conversely, the parameter space could be further expanded to include other additives such as polyvinyl alcohol or covalent crosslinking methods.
Overall, these results showcase the utility of integrating ML and active learning for the development of advanced biomaterial-based drug delivery systems. Moreover, this approach establishes a generalizable strategy for the rapid optimization of hydrogel formulations for therapeutic proteins that are expensive, scarce, or sensitive. This framework offers a powerful platform for accelerating the discovery and translation of next-generation controlled release technologies.
Supplementary Material
SI
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Vigata M, , Hydrogels as drug delivery systems: a review of current characterization and evaluation techniques, Pharmaceutics 12 (12) (2020).10.3390/pharmaceutics 12121188 PMC 776242533297493 · doi ↗ · pubmed ↗
- 2Li J, Mooney DJ, Designing hydrogels for controlled drug delivery, Nat. Rev. Mater 1 (12) (2016).10.1038/natrevmats.2016.71PMC 589861429657852 · doi ↗ · pubmed ↗
- 3Abasalizadeh F, , Alginate-based hydrogels as drug delivery vehicles in cancer treatment and their applications in wound dressing and 3D bioprinting, J. Biol. Eng 14 (1) (2020) 8.32190110 10.1186/s 13036-020-0227-7PMC 7069202 · doi ↗ · pubmed ↗
- 4Yasin A, Ahmad S, Imran S, Fabrication of famotidine loaded Sepiolite/pectin bionanocomposite hydrogels for controlled drug delivery, Chemistry Select 9 (35) (2024) e 202402275.
- 5Pan L, , Novel hybrid system based on carboxymethyl chitosan hydrogel encapsulating drug loaded nanoparticles for prolonged release of vancomycin in the treatment of bacterial infection, J. Pharm. Sci 114 (3) (2025) 1563–1571.39827915 10.1016/j.xphs.2025.01.012 · doi ↗ · pubmed ↗
- 6Alam A, Khan A, Khandelwal M, Concentration-dependent bacterial cellulose patches: a strategy for modulating the drug release beyond the modifications of the native cellulose hydrogel, Proc. Indian Natl. Sci. Acad 91 (2) (2025) 625–636.
- 7Lee KY, Mooney DJ, Alginate: properties and biomedical applications, Prog. Polym. Sci 37 (1) (2012) 106–126.22125349 10.1016/j.progpolymsci.2011.06.003PMC 3223967 · doi ↗ · pubmed ↗
- 8Augst AD, Kong HJ, Mooney DJ, Alginate hydrogels as biomaterials, Macromol. Biosci 6 (8) (2006) 623–633.16881042 10.1002/mabi.200600069 · doi ↗ · pubmed ↗
