Applications of survival analysis and learning curves methods in neurosurgical stroke data and simulations to account for provider heterogeneity
Usha S. Govindarajulu, Rivera Daniel, Reynolds Eric, Brown Cole, Zhang Jack, Cohen Daniel, Schupper Alex

TL;DR
This paper explores how survival analysis and learning curves can improve understanding of surgical outcomes for hemorrhagic stroke patients by accounting for provider differences and learning effects.
Contribution
The study introduces novel applications of Cox frailty models and random survival forests to account for provider heterogeneity in neurosurgical stroke data.
Findings
Age was a significant predictor of time from procedure to death in an adjusted Cox model.
Smoking became the main statistically significant predictor in frailty models with restricted mean survival time.
Random survival forests showed the best fit for real data compared to other methods.
Abstract
We used a unique application of Cox frailty models as well as random survival forests (RSF) to capture unexplained heterogeneity amongst providers (Int Neuro. Article 102149, 2025), along with using restricted mean survival time as an alternative survival time measure. Hemorrhagic stroke accounts for approximately 10–20% of all strokes annually and some patients with it may benefit from a surgical intervention, a placement of an external ventricular drain (EVD) catheter to manage post-procedure complications. In order to model post-surgical outcomes, we first employed frailty models or RSFs to capture heterogeneity between providers and account for unmeasured covariates (Int Neuro. Article 102149, 2025). Additionally, we had modeled learning curves among operators to guide and improve surgical learning in which we utilized our database of EVD procedures for hemorrhagic stroke…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1- —Clinical and Translational Science Award (CTSA)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInsurance, Mortality, Demography, Risk Management · Frailty in Older Adults · Artificial Intelligence in Healthcare and Education
Introduction
External ventricular drainage (EVD) placement has long been one of the most frequently performed neurosurgical procedures. First described in 1744 by French surgeon Claude Nicholas Le Cat [4, 5], EVD has evolved into a critical, life-saving treatment for patients presenting with acute hydrocephalus, particularly in stroke cases involving intracerebral hemorrhage (ICH), subarachnoid hemorrhage (SAH), and intraventricular hemorrhage (IVH). The primary mechanism of EVD placement involves the drainage of cerebrospinal fluid (CSF), which alleviates pressure on surrounding brain tissue. This reduction in intracranial pressure (ICP) mitigates the risk of secondary brain injury and improves clinical outcomes. Additionally, EVD placement enables continuous ICP monitoring, providing crucial data to guide therapeutic decisions and clinical interventions [5, 6]. Despite its relative simplicity and frequent bedside performance, EVD placement carries inherent risks, including infections (e.g., ventriculitis), tract hemorrhage, and catheter obstruction necessitating readjustment or replacement[6, 7]. [8] Nevertheless, EVD remains an indispensable tool for neurosurgeons and neurocritical care physicians in managing patients with elevated ICP, particularly in stroke patients with ICH, SAH, and/or IVH.
In our previous work, we have modeled time to event and also incorporated this with the modeling learning curves [1, 3]. We wanted to extend this work to this EVD placement data, in which we had our own data collected on EVD procedures placed in adults greater than 18 years of age from 2019 to 2022 performed through the Mount Sinai Health System of New York City. The operators were physicians who were residents or faculty in the department of neurosurgery. The operators varied in experience and length of procedure, so it was of great interest to apply our learning curve methodology to this dataset. Through this retrospective chart review, we looked at time-to-event outcomes for these patients through standard survival techniques and also through our novel approach with frailty combined with restricted mean survival time (RMST) in order to account for potential heterogeneity between providers[2], through our own unique methods. We assessed both time to death as well as time to complication in separate analyses[2]. Furthermore, we also employed our learning curve analyses to better understand how case volume would affect procedural success and we employed different shapes for the learning curves, but this time we assessed these analyses in more detail through both extensive simulations as compared to the prior real data analyses. When simulating the data, we also tested how the number of surgeons and the censoring rate effected the learning curve model along with the different shape functions. We also used an array of different modeling methods along with shape functions to determine the optimal method of modeling the learning curve.
Methods
We first conducted summary statistics of the variables collected where we obtained the frequencies of number of operators/physicians, number of cases/operators, prior history of stroke and CVD, patient gender, diabetes status, Modified Rankin Score (MRS), and complications. We also assessed the means/standard deviations, median, min and max values of patient age, survival time (from procedure date to death or last follow up and survival time from time of procedure to time to complication. We also assessed the complication and death rates and obtained proportions of these rates [1].
For time to event analyses, we analyzed time to event as time of procedure to either time to first complication post procedure or time to death [1]. We also employed right censoring for those who did not have the event. We first conducted a non-parametric Kaplan-Meier (KM) analyses for both event times as well as a log-rank test to compare the survival curves to each other. We then used the standard Cox proportional hazard model after confirming the proportional hazards assumptions met to model time to event, either time to complication or time to death, and adjusted by EVD type, age, gender, anti-coagulant use, history of stroke, history of diabetes (yes or not), history of CVD, history of drug use, history of hypertension, history of smoking, and ICH or IVH or SAH diagnoses as well as clustering by physician to account for potential correlations within each physician’s cases[1].
In a novel application for this dataset, we were also interested in a frailty analysis to discern heterogeneity between operators and conducted a Cox frailty model analysis [1], which we had done previously but explain this in more mathematical detail. Also, in this development and in more mathematical detail, we also used RMST [8, 9] to estimate survival probabilities instead of a Cox model or machine learning method, adjusted by covariates, along with incorporating frailty in a new development, to handle unexplained heterogeneity. RMST has been used as an alternative metric for summarizing survival time distributions, where it is the mean survival over all subjects in the study but only up to a particular time point, t^89^, which is denoted as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:\tau\:$$\end{document} . If the observed data is not correlated, a natural nonparametric estimator for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:RMS{T}_{k}\left(\tau\:\right)$$\end{document} where k was the number of groups was given by plugging the Kaplan-Meier estimator, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:{\widehat{S}}_{k}\left(t\right),$$\end{document} for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:{S}_{k}\left(t\right),$$\end{document}
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:{\widehat{RMST}}_{k}\left(\tau\:\right)={\int\:}_{0}^{\tau\:}{\widehat{S}}_{k}\left(u\right)du.$$\end{document}For correlated data, instead of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:{\widehat{S}}_{k}\left(u\right),\:$$\end{document} we used the baseline survival function estimated from a stratified Cox regression model with shared frailty with grouping by operators. Specifically, we consider the following frailty model,
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Lambda_{ij}\left(t\mid w_j,Z_{ij},X_{ij}\right)=w_j{\Lambda\:}_{Z_{ij}}\left(t\right)\text{exp}{\beta\:}^{'\:}X_{ij},$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:{{\Lambda\:}}_{ij}(\cdot\:)$$\end{document} is the cumulative hazard function for the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:i$$\end{document} ^th^ subject in the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:j$$\end{document} ^th^ cluster, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:{Z}_{ij}$$\end{document} is a variable to determine the stratum (e.g., the assigned treatment group indicator), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:{w}_{j}$$\end{document} is the frailty term with mean 1 from a gamma distribution and variance of θ, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:{X}_{ij}$$\end{document} is the covariate vector for adjustment.
We then again our learning curve (LC) methods [1, 2] to adjust the time to death analyses by infusing a particular form of a parametric LC shape (exponential, power series, logarithmic, and log-normal), and then ran the models with an ordered procedure variable per operator to model learning within the adjusted Cox model. We obtained the predicted survival from the stratified Cox model.
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{ij}\left(t\right)=h_{0s}\left(t\right)exp\left(x_{ij}\beta+w_i\right)\\{\widehat p}_i={\widehat S}_i\left(t\right)={\widehat S}_0\left(t\right)exp\left(\widehat\beta x_i\right)$$\end{document}where h0s(t) denotes the stratified baseline hazard rate. The individual level estimated survival probabilities from the Cox model were denoted as pi for each observation as computed from multiplying the baseline survival rate, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:\widehat{{S}_{0}}\left(t\right)$$\end{document} , by the function of the covariates, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\:\text{e}\text{x}\text{p}\left(\widehat{\beta\:}{x}_{i}\right)$$\end{document} . The p0i are the predicted probabilities derived from Eq. 1 as pi, which is the asymptotic steady state outcome event rate in the simulation model.
These survival times were adjusted or updated for the physician learning curve, p1, for each shape separately, and then new survival probabilities where calculated and plotted against procedural volume. The p1 are the predicted probabilities from the physician learning curve model for the particular shape which were used to then calculate p2i for the center learning curve and these were the final predicted probabilities. The final predicted probabilities were re-constructed as an updated survival time, tnewi = (p2i/p0i)**time_i_ by taking a ratio of the final probabilities, p_2i, by the asymptotic steady state probabilities, p0i*_. This updated survival time became the new time variable in the previous dataset to generate new success probabilities. We also employed this in a Cox frailty model and the random survival forest (RSF) [9, 10] instead of just the Cox model to estimate survival in order to model learning curves between survival probabilities and procedure volume with these other methods. In our new development, we also employed the RMST in our learning curve framework and then compared how well using this compared to the clustered or frailty Cox model and the RSF.
To evaluate the robustness of our learning curve models, we conducted extensive simulation studies which was based on our prior published methodology [1, 2, 11, 12] and adapted for this neurosurgical data using the same covariates as from the real data analysis with similar distributional assumptions for the parameters. These simulations assessed the impact of various factors on model performance, including the number of surgeon operators (5 or 10 per center) and censoring rates (10% & 70%). We also tested four different shape functions (exponential, power series, logarithmic, log-normal) to determine their effectiveness in capturing learning patterns under various conditions. Through these simulations, we aimed to identify the optimal methods and shape functions for modeling surgical learning curves across a range of scenarios.
For each method and shape we show plots where each observed point in these figures is the binned average of predicted probabilities of success for the particular shape and scenario along with the corresponding learning curve. We also calculated root mean square error (rMSE) between the predictions and the marginal success rates in the observed data using our prior methods for this [1, 2] to see if these results aligned with the graphical results. All simulations and analyses were conducted in the R software. We used the libraries, survival, frailtypack, and our own R program for the RMST frailty analyses. We used a significance level of 0.05 for all analyses.
Results: survival analyses on real data
Univariate analyses
The study population was created from 652 patient medical records between 2019 and 2022. After removing duplicate subjects and subjects who did not meet eligibility criteria or subjects that had confidential records and could not be included we had 439 subjects. Out of these 439 patients after removing patients missing key information the final study population consisted of 160 patients. Our Table 1 had laid out clear demographics amongst the study population [1]. Our study population consisted of patients over the age of 18 who had an EVD procedure done after having an SAH, ICH, or IVH event. These patients were treated with either Bactericidal, IRRAflow or Cerebroflo© EVD devices and were not missing any key variables. For analytical purposes, we combined the IRRAflow and Cerebroflo devices and called that combination the non-antibiotic EVD devices as compared to the Bactericidal antibiotic device. The median age in the study population was 59 with the minimum age being 27 years old and the maximum patient age was 93. Regarding the patient diagnosis, we observed that 53% (84) had SAH, 41% (65) had IVH and 47% (75) had ICH diagnoses. Amongst the 429 subjects not included in the final study population, EVD type (~ 40%) and History of Smoking (~ 22%) had the most missing or unknown observations. In most cases data was not missing but rather results could not be assigned to a specified category resulting in them being treated as unknown for the sake of analysis. Approximately 64% (103/160) were treated with Bactericidal devices, the other 36% (57/160) were treated with IRRAflow/Cerebroflo©. The racial composition was diverse. Approximately 26% (42) of the subjects were Black or African-American, 24% (38) were Hispanic, 21% (33) were White, 11% (17) were Asian and 19% had a races that was Unknown or missing. Meanwhile the gender distribution was nearly equal with approximately 51% (81) were male and 49% (78) were female. Approximately 28% (44) of patients received anti-coagulants during treatment. Post-procedure mortality was approximately 26% (41) of patients while 17% (27) of patients experienced a complication post procedure.
Table 1.VariableOverall(N = 160)Mean (sd) or n (%) Age 57.5 (14.5) Gender Male81 (50.6%)Female78 (48.8%) Center EHC11 (6.9%)MSH91 (56.9%)MSM8 (5.0%)MSW48 (30.0%) Race Asian17 (10.6%)Black42 (26.3%)Hispanic38 (23.8%)White33 (20.6%)Unknown30 (18.8%) Height 65.7 (4.05) Weight 175 (40.7) BMI 29.2 (12.2) Diabetes 20 (12.5%) History Stroke 18 (11.3%) History Hypertension 55 (34.4%) Anti-coagulant Use 44 (27.5%) Smoke 78 (48.8%) History CVD 24 (15.0%) Pre-OP MRS 2.93 (1.67) SAH 84 (52.5%) IVH 65 (40.6%) ICH 75 (46.9%) Device IRRAflow/Cerebroflo©57 (35.6%)Bactericidal103 (64.4%) Time to Death 31.8 (30.4) Status - Death 41 (25.6%) Time to Complication 27.2 (29.9) Status - Complication 27 (16.9%)
Time to death analyses
Kaplan-Meier plots for time of procedure to time to death, both unadjusted as well as adjusted (Fig. 1) as per prior analyses [1], revealed that anti-coagulant use did not show a significant difference in survival initially. However, after adjusting for other covariates, at which point the survival rate for those on anti-coagulants appeared higher. Also, in terms of the same types of plots for EVD device we saw even after adjustment that there was not any statistically significant difference between the devices being compared, IRRAflow/Cerebroflo© and Bactericidal. We found in the Cox proportional hazards regression for time of procedure to time of death with physician clustering (Table 2) [1], that age was a significant risk factor in both univariate (p < 0.001) and multivariate (p < 0.015) models. However, pre-operative MRS, history of stroke, and history of smoking, which were initially statistically significant in the univariate models (P-values respectively: <0.006, < 0.027, < 0.006) became borderline or not statistically significant in the multivariable model (multivariable p-values respectively: <0.074, < 0.10, < 0.13).
Stratified KM plots: time to death stratified by EVD type or anticoagulant use
Table 2. Cox model for time to death with physician clusteringCharacteristicUnadjustedAdjustedHR^1^95% CI^1^p-valueHR^1^95% CI ^1^p-valueAge1.031.01, 1.060.0011.041.01, 1.070.015Gender0.830.48, 1.430.51.230.60, 2.520.6Diabetes0.910.36, 2.290.80.550.15, 1.990.4History Stroke2.171.09, 4.330.0272.220.87, 5.650.10History Hypertension1.450.84, 2.520.20.930.37, 2.390.9Anticoagulant use0.640.34, 1.220.20.450.18, 1.120.087History Smoke2.521.31, 4.860.0061.830.84, 4.020.13SAH Diagnosis1.190.69, 2.040.52.250.59, 8.530.2IVH Diagnosis0.920.54, 1.590.81.100.51, 2.350.8ICH Diagnosis1.000.58, 1.71> 0.92.390.68, 8.430.2Pre-OP MRS1.311.08, 1.590.0061.240.98, 1.560.074History CVD1.730.92, 3.260.0891.010.43, 2.39> 0.9History Drug Use0.820.40, 1.680.60.740.29, 1.900.5EVD Type^2^1.260.70, 2.290.41.110.47, 2.620.8^1^ HR Hazard Ratio, CI Confidence Interval ^2^Reference value: IRRAflow/Cerebroflo©
Time to complication analyses
For the Kaplan-Meier plots for time of procedure to time of first complication (Fig. 2), we found no survival difference in anti-coagulant use between unadjusted and adjusted analyses as per prior analyses [1], however, we did find a survival difference between the EVD device types with Bactericidal showing a higher probability of survival on average, but unadjusted and adjusted analyses showed similar rates of survival between devices.Fig. 2. Stratified KM plots: time to complication stratified by EVD type of anticoagulant use
For the Cox proportional hazard regression model for time to complication (Table 3) as per prior analyses [1], we found that EVD device type remained as a significant predictor in either univariate (p < 0.008) or multivariate (p < 0.010) Cox model analyses while pre-operative MRS was initially significant in the univariate analysis (p < 0.014) but became borderline significant in the multivariate analysis (p < 0.060).
Table 3. Cox model for time to complication with physician clustering Characteristic SimpleAdjusted HR ^1^
95% CI ^1^
p-value
HR ^1^
95% CI ^1^
p-value Age1.000.97, 1.03> 0.91.000.97, 1.040.8Gender0.860.41, 1.820.71.040.44, 2.44> 0.9Diabetes0.630.15, 2.660.50.470.09, 2.350.4History Stroke1.120.34, 3.720.90.750.19, 2.930.7History Hypertension1.610.76, 3.450.21.290.41, 4.070.7Anticoagulant use0.980.43, 2.22> 0.90.810.31, 2.090.7History Smoke0.990.46, 2.10> 0.90.660.25, 1.770.4SAH Diagnosis0.900.43, 1.880.80.960.26, 3.54> 0.9IVH Diagnosis1.540.73, 3.230.31.940.75, 4.970.2ICH Diagnosis0.670.31, 1.430.30.600.19, 1.920.4Pre-OP MRS0.750.60, 0.940.0140.790.62, 1.010.060History CVD2.260.96, 5.320.0622.300.64, 8.350.2History Drug Use0.490.21, 1.160.110.450.14, 1.380.2EVD Type^2^0.360.17, 0.770.0080.350.15, 0.780.010^1^ *HR * Hazard Ratio, CI Confidence Interval^2^Reference value : IRRAflow/Cerebroflo©
Frailty analyses
As per prior analyses [1], in the regular Cox frailty model, none of the variables showed a statistically significant effects on time to death (Table 4), with history of smoking variable (p < 0.0640) being the closest to significance. In the RMST Cox frailty model, history of smoking had an even more statistically significant effect in the model (p < 0.0534) as compared to the Cox frailty model. The frailty p-value was not statistically significant in either the regular Cox frailty or RMST Cox frailty models, which indicated that there was no significant heterogeneity in care between different providers. These analyses contradicted the regular Cox model analyses for time to death with physician clustering which had identified age as the main significant predictor so clearly, taking into account the provider heterogeneity altered the results for the main predictor of mortality.
Table 4. Cox frailty & RMST frailty models: real data analysesVariableCox frailty modelRMST Cox frailtycoefSe(coef)P-valueFrailty p-valuecoefSe(coef)P-valueFrailtyAge0.0100.01520.52000.950.0100.01520.5004Var: 4.13 x e^− 15^P = 0.5Gender0.1740.37440.64000.2190.37670.5607Diabetes−0.1230.63550.8500−0.1430.60980.8148Pre-op MRS0.1530.12870.23000.1570.12870.2221History Stroke0.3800.54180.48000.4090.55870.4647History Hypertension0.0270.53110.9600−0.0030.53700.9951Anti-coagulant Use−0.4120.48660.4000−0.4150.48480.3918Smoke0.7120.38460.06400.7390.38250.0534SAH0.3160.59420.59000.2670.60000.6568IVH0.2750.39170.48000.3010.39820.4500ICH0.4810.55690.39000.4550.56040.4164History CVD0.4500.48110.35000.4330.47190.3588History Drug Use0.5110.37940.18000.5080.37300.1728Device0.2620.22930.2500NARMST DIFFERENCE13.48 (21.1)95% CI:(27.86,54.83)
Results: learning curve real data analysis
We had also shown visual representations for learning curves of the probability of procedural success by plotting our predicted survival probabilities from each model against procedural volume from our real data for time to death analyses. We show this here again [1] for comparison to the simulated learning curves. For the standard Cox model with clustering by physicians (Fig. 3) we saw a consistent dip across shapes around 40 cases and then back to an increase after that in the learning curve. The Cox frailty plots (Fig. 4) showed a more consistent and upward learning curve across case volume, aligning with expectations of continuous improvement. However, the RSF learning curve plots (Fig. 5) we did not show much of any learning trend. Finally, for the mean-squared error calculations (Table 5) between observed and fitted values, we found that the RSF seemed to fit the trajectories the closest while the Cox frailty model that showed an expected learning curve trajectory was farthest from the observed data points. A potential issue with that was that the majority of observations were clustered in the lower-case order.AQ
Fig. 3. Cox model: real data analysis learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Fig. 4. Cox frailty model: real data analysis learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Fig. 5. Random survival forest: real data analysis learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Table 5. Learning curve Models – Mean squared errorModelShapeRegular CoxFrailty CoxRandom Survival ForestExponential0.11760.16920.0883Power Series0.11380.16930.0878Logarithmic0.12430.17180.0988Log-Normal0.13030.17060.0890The mean squared error above compare the observed with the predicted estimates from the model
Simulation learning curve results
We conducted learning curve simulations for the regular Cox model, the regular Cox frailty model, RMST Cox frailty, and the RSF across sixteen scenarios to assess the best fitting method. For the simulation framework, we varied the number of surgeons, the censoring rate, and the assumed shape (exponential, power series, logarithmic, or log-normal) across these scenarios (Table 6). For all the models, we performed 50 simulations per scenario. For the RMST frailty model, we were only able to perform 10 simulations per scenario due to computational constraints.
Table 6. Simulation scenariosScenarioNumber of physicians/centerCensoringShape11010%Exponential21010%Power Series31010%Logarithmic41010%Log-Normal5510%Exponential6510%Power Series7510%Logarithmic8510%Log-Normal91070%Exponential101070%Power Series111070%Logarithmic121070%Log-Normal13570%Exponential14570%Power Series15570%Logarithmic16570%Log-Normal
For the regular Cox models, scenarios 1 to 4, which had more surgeons and low censoring rate (Fig. 6), we found that the logarithmic shape had the best graphical fit. While for scenarios 5 to 8 (Fig. 7), which had fewer surgeons and low censoring rate this was similar to scenarios 1 to 4, were we saw that the logarithmic model seemed to have the best fit. Scenarios 9 to 12 (Fig. 8), which had higher censoring rate and more surgeons did not show any of the shapes as well fitting to the learning curve. Scenarios 13 to 16 (Fig. 9), which had higher censoring and less surgeons, showed that the logarithmic shape fitted the best, but it was not as well fitting as in other scenarios. Across the 16 scenarios we saw that the logarithmic shape provided the best results for the regular Cox model while we found that increasing censoring reduced the accuracy of the model and having more surgeons led to more accurate results. This aligned with our previous learning curve research [1–4].
Fig. 6. Cox model : simulations 1-4 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Fig. 7. Cox model : simulations 5-8 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal)and observed average points
Fig. 8. Cox model : simulations 9-12 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal)and observed average points
Fig. 9. Cox model : simulations 13-16 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
We then ran the Cox frailty model for the learning curve for all 16 scenarios. The most notable change came from altering the censoring rate. When censoring was low (Fig. 10, 11,) we observed that the exponential & logarithmic functions showed similar fits to the data. The log-normal results were the best if we consider its behavior after 50 cases. None of the shape functions used gave particularly accurate results. Scenarios 9 to 16 (Fig. 12, 13) had high censoring so the results were less predictable. The results were similar in most cases when censoring was high, the only exception is that the log-normal performed better when there were less surgeons in the model. The Cox frailty model showed inconsistent results. The MSEs from Table 7 would indicate the Cox frailty and the regular Cox model were fairly consistent across all scenarios between the fit between predicted survival and average survival and observed points. The logarithmic shape also in general showed lower MSEs across the scenarios as compared to the other shapes. For more granularity for the RSF and the Cox RMST frailty, sometimes the RSF was better than the Cox RMST frailty at fitting the actual shape as compared to p0 (baseline) predictions and sometimes worse, but the Cox RMST frailty did better with the fit between both p1 (physician) and p2 (center) predictions as compared to p0 than the RSF fits to the same type of predictions. For the RSF, the exponential and power series performed better across the scenarios in terms of MSEs while for the Cox RMST frailty, the logarithmic still seemed to perform better though we could not ascertain its fit in scenario 3 due to lack of convergence.
Fig. 10. Cox frailty model : simulations 1-4 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Fig. 11. Cox frailty model : simulations 5-8 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Fig. 12. Cox frailty model : simulations 9-12 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Fig. 13. Cox frailty model : simulations 13-16 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal)
Table 7– Simulations: mean squared errorModelRegular CoxFrailty CoxRandom Survival ForestRMST Cox frailtyScenarioMSE1MSE1MSE2MSE2_aMSE2_bMSE2MSE2_aMSE2_b10.01010.01040.01920.00410.00540.03540.00890.005020.00790.01060.04790.00620.00520.05120.00840.005130.01030.00990.20420.01620.009040.00900.01390.04540.00910.00660.09060.00520.006850.02720.01950.01000.00540.00830.03680.00340.001360.02530.02650.03020.01030.01080.05180.00330.001270.01930.02820.19320.04380.04100.01780.00410.001880.02660.02810.11300.00970.01370.08670.00770.004690.02430.02150.01870.00520.00720.02830.01030.0069100.01990.02180.04670.00770.00700.04160.01030.0074110.01990.02950.21780.03100.02840.01020.02090.0087120.02200.03180.07000.00730.00940.21980.01140.0126130.04930.04180.00970.00610.01010.03110.00680.0035140.04600.04250.02920.01200.01300.04400.00730.0039150.04600.05460.16550.06220.06100.02270.01090.0125160.04670.04760.14060.01030.01730.21470.02060.0120MSE1 – Mean Squared Error between the mean simulated values and the predicted valuesMSE2 – Mean Squared Error between p0 and the shape valuesMSE2_a – Mean Squared Error between p0 and p1MSE2_b – Mean Squared Error between p1 and p2
The RSF simulations results were fairly consistent across the 16 scenarios with most of the shapes predicting the high predicted survival probability from the method. In this particular analysis, we were able to delineate p0 (baseline), p1 (physician), and p2 (center) levels of learning with the shape being just the actual fitted shape. The number of surgeons did not have much impact but the higher censoring rate of 70% (scenarios 9–16) showed the p1 curve for the logarithmic shape only showed a more typical LC fit. We observed that the MSE for between p0 and p1 for the higher scenarios then tended to be lower as compared to the other models (Table 8).
Table 8– RMST Cox frailty HR per probability levels and scenariosScenarioP0P1P211.0080.9420.93721.0081.0211.058341.0081.0121.04251.0041.0880.99961.0041.0211.04471.0041.0161.01881.0040.9890.95991.0041.0190.997101.0040.9981.064111.0041.0491.019121.0041.0161.006130.9881.2191.062140.9880.9551.039150.9881.0141.042160.9881.0440.934
For the Cox RMST frailty model, unlike the other models since we did not obtain individual predicted survival from this modeling, we only collected the MSE and hazard rates for each scenario. We also ran into an issue with scenario 3 only where RAM issues kept causing the software to crash. Across the scenarios the logarithmic shape function performed best. Interestingly both the logarithmic model and the log-normal model increased the observable MSE between the default model and the model adjusted for surgeons, as well as the MSE between the surgeon adjusted model and the surgeon & site adjusted model. This suggested that although the logarithmic performed better overall, it did suppress some of the effects of surgeon and sites in the model (Figs. 1415, 16, and 17).
Fig. 14. Random survival forests : simulations 1-4 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Fig. 15. Random survival forests: simulations 5-8 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Fig. 16. Random survival forests: simulations 9-12 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Fig. 17. Random survival forests: simulations 13-16 learning curve plots for 4 shapes (exponential, power series, logarithmic, and log-normal) and observed average points
Discussion
From the main survival analysis results presented and as per prior analyses [1], we saw that age was the main predictor in post-procedure time to death analysis. Other variables appeared to have had significant effects in the simple model that were not present in the adjusted model. This indicated the possible presence of an unobserved confounding variable or variables that had significant effects on the mortality outcome. We demonstrated that besides standard survival analysis which was important to assess for these EVD procedures that also employing more unique analyses with modeling unexplained heterogeneity through frailty model could lead to other results. The Cox frailty model as well as our RMST Cox frailty model showed that smoking status was the main predictor of mortality in the presence of any latent heterogeneity within the model, even though the frailty effect itself was not statistically significant in either model. This suggested that incorporating modeling frailty may have shown a more genuine causal effect between a predictor of mortality amongst these stroke patients who had an EVD procedure.
In the time to complication analysis the type of EVD device showed a statistically significant effect, with Bactericidal devices performing better than IRRAflow/Cerebroflo© in terms of post-operative complications times. Also, pre-operative Modified Rankin Score showed a significant effect in univariate analysis and a borderline effect in the multivariate analysis, aligning with expectations of it to be a significant predictor. Meanwhile age and smoking did not appear to show significant effects in these analyses.
In the real data analysis of the learning curve, we observed a common trend across all plotted models, a brief decrease in probability of success. The main observable differences between models was were the timing and severity of this dip. For the Cox model analyses, a substantial dip occurred after 40 procedural cases while for the Cox frailty model analyses, the dip was less severe and occurred around 20 procedural cases. Finally for the RSF, only very minor dips were observed around 30 procedural cases. We believe this observed temporary dip was likely the result of mentorship. Often a more experienced practitioner would supervise a surgeon until they were more familiar with the procedure. This would have resulted in higher initial probability of success while the mentor was providing assistance. We posit that removing this mentoring effect might reveal a more logarithmic learning curve, characterized by low initial probability of success that increases over time. This hypothesis is supported by the observed behavior in our simulated models.
The simulated learning curves demonstrated consideration variation across the models and scenarios, yet some consistent trends could be observed. The logarithmic shape function generally performed similar to or better than the other shape functions, aligning with logical expectations of high learning rates that plateau as the model approaches an asymptotic limit. The primary differences between models were evident in their sensitivity to varying parameters across simulations. The Cox frailty model showed great differences as the censoring rate and number of surgeons were varied but it was not as affected by changes in the shape as compared to the regular Cox model. The intercept of the regular Cox model tended to vary significantly according to adapting the rate of censoring and the accuracy of the predictions from this model appeared to vary according to the number of surgeons per scenario. The use of different shapes for the LC appeared to influence the slope of the predicted values. In regard to the RSF, the method appeared desensitized to the parameter values and this was observed across all scenarios.
In comparing the simulated results and the real data analysis, it appeared that one of the most significant factors that affected learning rates may have been the presence of a mentor or supervisor. Further research should be performed to investigate if this observed effect is the result of mentorship, how this effect varies across mentors, how do we determine the actual learning rate when surgical outcomes are being adjusted by supervisors and how do we optimize learning rate while minimizing negative surgical outcomes through the application of mentorship.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
