Domain-Aware Interpretable Machine Learning Model for Predicting Postoperative Hospital Length of Stay from Perioperative Data: A Retrospective Observational Cohort Study

Iqram Hussain; Joseph R. Scarpa; Richard Boyer

PMC · DOI:10.3390/bioengineering13020147·January 27, 2026

Domain-Aware Interpretable Machine Learning Model for Predicting Postoperative Hospital Length of Stay from Perioperative Data: A Retrospective Observational Cohort Study

Iqram Hussain, Joseph R. Scarpa, Richard Boyer

PDF

Open Access

TL;DR

This study developed an interpretable machine learning model to predict hospital length of stay after surgery, identifying key factors like operation duration and lab values that influence recovery time.

Contribution

The novel contribution is an interpretable machine learning framework that integrates multimodal perioperative data to predict and explain postoperative hospital length of stay.

Findings

01

The model achieved R2 = 0.61 and MAE ≈ 1.34 days on the holdout set.

02

Operative duration, diagnostic complexity, and intraoperative hemodynamic variability were the strongest predictors of postoperative stay.

03

Lower albumin levels and complex procedures were linked to prolonged hospitalization.

Abstract

Background and Objective: Postoperative hospital length of stay (LOS) reflects surgical recovery and resource demand but remains difficult to predict due to heterogeneous perioperative trajectories. We aimed to develop and validate an interpretable machine learning framework that integrates multimodal perioperative data to accurately predict LOS and uncover clinically meaningful drivers of prolonged hospitalization. Methods: We studied 97,937 adult surgical cases from a large perioperative registry. Routinely collected perioperative data included patient demographics, comorbid conditions, preoperative laboratory values, intraoperative physiologic summaries, and procedural characteristics. Length of stay was modeled using a supervised regression approach with internal cross-validation and independent holdout evaluation. Model performance was assessed at both the cohort and individual…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes4

ALB CRP ALPP ITIH2

Proteins4

Species1

Homo sapiens(human · species)

Chemicals5

creatinine CO2 chloride ASA oxygen

Diseases12

CTS postoperative thromboembolic infection death musculoskeletal, and respiratory disorders blood loss neoplasms LOS respiratory diseases inflammation injury to

Figures7

Click any figure to enlarge with its caption.

Funding2

—National Institute on Aging (NIA) of the National Institutes of Health (NIH)
—Foundation for Anesthesia Education and Research (FAER)

Keywords

postoperative length of stay (LOS)perioperative medicinesurgical outcomesinterpretable machine learningdomain-aware modeling

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEnhanced Recovery After Surgery · Cardiac, Anesthesia and Surgical Outcomes · Sepsis Diagnosis and Treatment

Full text

1. Introduction

Postoperative hospital length of stay (LOS) is a key indicator of surgical recovery, hospital efficiency, and perioperative quality of care [1,2]. Prolonged LOS increases healthcare costs, exposes patients to additional complications, and often reflects delayed recovery driven by modifiable preoperative or intraoperative factors [3]. Accurate prediction of LOS before or during surgery can therefore facilitate proactive discharge planning, optimize operating room scheduling, and enable targeted interventions for patients at elevated risk of prolonged hospitalization [4,5].

Extended hospital stay is also closely linked to adverse postoperative outcomes, including complications, readmissions, and mortality [6]. Patients with longer recoveries face higher rates of infection, thromboembolic events, and functional decline, making LOS a pragmatic surrogate marker of perioperative recovery and hospital performance [7]. Enhancing the accuracy and interpretability of LOS prediction can help clinicians identify vulnerable patients early and guide data-informed perioperative management strategies.

Traditional tools for predicting postoperative LOS have typically relied on a limited set of preoperative variables, linear regression models, or procedure-specific scoring systems [8,9,10]. Such approaches often fail to capture the complex interplay between physiological, procedural, and biochemical factors that influence recovery. Moreover, conventional models provide minimal interpretability, limiting clinical trust and hindering integration into perioperative decision-support workflows [11,12].

Recent advances in artificial intelligence (AI) and digital health technologies have transformed modern healthcare by enabling data-driven preoperative risk stratification, personalized care, and optimization of healthcare resources across diverse clinical settings [13,14,15,16,17,18]. In particular, machine learning (ML) approaches have demonstrated substantial potential in the management of chronic diseases, perioperative risk assessment, and prediction of hospitalization-related outcomes, including LOS, readmission, and postoperative recovery trajectories [19,20,21,22,23]. The growing availability of large-scale, multimodal perioperative datasets—integrating static clinical characteristics with high-resolution intraoperative physiological signals—has further accelerated the adoption of ML-based predictive models in perioperative medicine.

However, the clinical translation of these models has been limited by concerns regarding transparency, interpretability, and trust. Recent advances in interpretable ML have enhanced explainability and clinical trust in the analysis of large-scale, multimodal perioperative datasets that integrate static clinical variables with dynamic intraoperative signals [24,25,26,27]. To address the interpretability, explainable artificial intelligence (XAI) techniques—including feature attribution methods [12,28,29] and global explanation approaches [30,31,32,33]—are employed. In perioperative medicine, where clinical accountability and interpretability are paramount, understanding why a model predicts prolonged LOS is as important as achieving high predictive accuracy.

In this context, domain-aware modeling plays a crucial role by structuring diverse perioperative variables into clinically coherent domains, where each domain represents a meaningful aspect of the surgical journey—for example, preoperative laboratories, intraoperative physiology, diagnoses, procedures, and durations. By grouping related variables into these interpretable clinical domains, the model captures higher-order relationships within and across categories that would be obscured in feature-level analyses. This organization enhances interpretability how much each clinical domain contributes to model performance, enabling both transparent interpretation and targeted understanding of the perioperative factors that most strongly influence postoperative LOS. The key contributions of this study are threefold.

First, we present an interpretable, domain-aware machine-learning framework that leverages routinely collected perioperative data—including patient characteristics, laboratory results, physiologic measures, and procedural information—to prospectively estimate postoperative length of stay.Second, the model demonstrates robust and generalizable performance while identifying clinically meaningful determinants of prolonged hospitalization, such as operative duration, diagnostic complexity, and perioperative physiologic and biochemical perturbations.Third, by integrating feature-level attribution with domain-level analysis, the framework provides interpretable, clinically grounded insight into how distinct perioperative data streams jointly shape postoperative recovery and hospital resource utilization.

2. Materials and Methods

This study presents a domain-aware, interpretable machine learning framework for predicting postoperative hospital LOS using routinely collected perioperative data (Figure 1). The analytic pipeline integrates structured features spanning preoperative, intraoperative, and postoperative phases—including demographics, comorbidities, laboratory results, physiologic time-series summaries, and procedural information. A gradient-boosted decision-tree model was trained to estimate postoperative LOS, with transparency achieved through clinically structured, domain-level feature grouping and complementary interpretability analyses.

2.1. Data Sources

This retrospective study analyzed perioperative and laboratory data sourced from the INSPIRE database, comprising operative cases from Seoul National University Hospital between 2011 and 2020 [34,35]. Adult, non-obstetric surgical encounters with complete perioperative timestamps and structured preoperative, intraoperative, and ward variables were included. Emergency surgeries and cesarean sections were excluded to maintain a homogeneous elective surgical cohort. Because INSPIRE is fully de-identified and publicly accessible, this study was exempt from institutional review board oversight.

2.2. Cohort Characteristics

Adult, non-obstetric surgical encounters were selected from the INSPIRE perioperative dataset. Eligible cases included patients aged ≥ 18 years who underwent elective surgery with complete perioperative timestamps and available structured demographic, laboratory, intraoperative physiologic, diagnostic, and procedural data. Emergency surgeries and obstetric procedures were excluded. Encounters with missing timestamps, implausible or non-positive length of stay, or evidence of temporal data leakage were removed.

The final cohort included 97,937 surgical cases (Table 1, Figure 2). Median age was 55 years (IQR 45–65), with 56.7% male and 43.3% female; median BMI was 23.8 kg/m^2^ (IQR 21.5–26.0). Most patients were ASA-PS II (53.9%) or I (38.4%). General anesthesia was most common (81.5%), followed by neuraxial (9.3%) and monitored anesthesia care (9.1%). Major departments included general surgery (29.1%), orthopedics (12.1%), and otorhinolaryngology (11.0%); 7.0% were emergency cases. Median OR duration was 135 min (IQR 90–220), and anesthesia duration was 120 min (IQR 75–200). Preoperative labs were within normal ranges (albumin 4.1 g/dL, hematocrit 39.1%, creatinine 0.8 mg/dL).

2.3. Outcome Definition

The primary outcome was postoperative hospital LOS, measured in days. The LOS start time was defined as the operating-room exit time, defaulting to operation or anesthesia end time when unavailable. The end time corresponded to hospital discharge or, in cases of in-hospital death, the recorded time of death. Encounters with LOS ≤ 0 were excluded. To remove implausible values, LOS was capped at 90 days and further truncated at the 95th percentile. These thresholds were applied to limit the influence of extreme outliers, which represent a small fraction of cases and can disproportionately affect loss optimization in tree-based models. Clinically, very prolonged hospitalizations are often driven by rare events or non-clinical factors, such as discharge disposition delays or social barriers, that are not well captured by perioperative features. The resulting distribution showed a right-skewed pattern typical of perioperative recovery, with a median 3.6 days (IQR 1.6–6.6) and mean ± SD 5.7 ± 7.4 days (Figure 2E).

2.4. Perioperative Variables and Clinically Structured Domain Grouping

Predictors were derived from routinely collected perioperative data, restricted to variables available before or during surgery to prevent post-outcome leakage. The analytic feature set encompassed demographics (age, gender, BMI, ASA-PS), preoperative laboratory indices (albumin, chloride, hematocrit, CRP, and others), preoperative ward vital signs summarized by mean, minimum, maximum, and standard deviation (heart rate, respiratory rate, blood pressure), and intraoperative physiological measures expressed as means, ranges, and absolute deltas for key parameters including arterial and non-invasive blood pressure, heart rate, end-tidal CO_2_, tidal volume, minute ventilation, and urine output (Table 2). Procedural information was represented using one-hot NHSN-style categories such as breast, gastric, and hip-prosthesis surgery, while diagnostic information was reduced to binary indicators derived from ICD-10 chapters or clinically coherent diagnostic groups. Additional predictors included Charlson-style comorbidity flags and intraoperative duration metrics (anesthesia and operating-room time). Variables with poor standardization or potential temporal leakage—such as postoperative FiO_2_ changes, estimated blood loss, ECMO/CRRT indicators, or mortality timestamps—were excluded.

Each remaining feature was mapped to a human-readable domain label (e.g., demographics, preoperative laboratories, intraoperative vitals, procedures, diagnoses), enabling interpretation of model behavior both at the individual-feature and aggregated-domain levels.

2.5. Data Preprocessing

A domain-aware preprocessing pipeline was applied to ensure data integrity and prevent information leakage. Postoperative LOS was log-transformed to correct skewness. Continuous variables were median-imputed and robust-scaled, categorical variables were mode-imputed and one-hot encoded, and near-constant features (variance < 0.01) were removed. The resulting standardized matrices were used as input for feature selection and model training.

2.6. Feature Selection with Clinical Priors

To reduce dimensionality while maintaining clinical interpretability, we combined linear sparsity, tree-based attribution, and domain diversity. First, LassoCV with five-fold cross-validation was applied to the preprocessed training matrix to identify sparse linear predictors [36]. Second, a CatBoost regressor (iterations = 100, learning rate = 0.05, depth = 4) was trained to compute Shapley additive explanations (SHAP) values, and features with high mean absolute SHAP values were prioritized [30,37,38]. To avoid over-reliance on any single domain, we aggregated SHAP importance within domains and force-included the most informative feature from each high-impact group. Clinically essential variables such as age and ASA were preserved regardless of statistical ranking. Finally, we capped the selection at no more than eight features per domain and 20 features in total, resulting in a compact, interpretable panel spanning demographics, preoperative labs, intraoperative vitals, intraoperative procedures, diagnoses, and surgical duration.

2.7. Model Training and Validation

The final feature set was used to train a gradient-boosted decision tree model implemented with CatBoost [39]. We specified 300 boosting iterations, a depth of 6, and a learning rate of 0.05, with random state fixed for reproducibility. Predictions were generated in log space and then exponentiated back to minutes. To evaluate generalizability, we applied a GroupKFold strategy to account for subject-level clustering [36]. Performance was assessed both on the independent hold-out test set and through out-of-fold predictions from five-fold cross-validation. Metrics included mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), and the coefficient of determination (R^2^), reported in both log and unlogged LOS space. To quantify uncertainty, we computed 95% confidence intervals for R^2^ and MAE using 1000 paired bootstrap resamples of the test predictions.

2.8. Model Explainability and Domain Ablation

To enhance clinical interpretability, we applied SHAP to quantify the contribution of each predictor to model output [38,40]. Global feature importance was summarized as the mean absolute SHAP value across the training and test sets, and results were visualized using SHAP summary plots.

[eqn]

where M is the set of all features, x_S_ is a subset of features S, |S| represents the cardinality of set S, SHAP_i_ is the SHAP value for feature i, and φ_i_(F,x) is the SHAP value function for feature i.

To better capture domain-level insights, feature-level SHAP values were aggregated by predefined groups (e.g., demographics, preoperative labs, intraoperative vitals, procedures), producing a domain-wise ranking of importance. In parallel, we performed leave-one-domain-out ablation experiments using an XGBoost regressor with GroupKFold cross-validation [41]. For each domain, we re-trained the model without its features and computed the change in R^2^ (ΔR^2^) relative to the full model. Domains with larger ΔR^2^ values were considered more critical for predictive performance. We compared domain-level SHAP scores with ΔR^2^ drops to cross-validate importance rankings.

3. Results

3.1. Model Performance and Generalization

The CatBoost regression model demonstrated consistent and robust performance in predicting postoperative hospital LOS (days) (Figure 3A–E). On the independent holdout dataset, the model achieved an R^2^ = 0.61, MAE = 1.34 days, and RMSE = 2.05 days. Performance on out-of-fold (OOF) cross-validation was nearly identical (R^2^ = 0.60, MAE = 1.34 days, RMSE = 2.06 days), underscoring the framework’s reproducibility and generalizability across cohorts.

Scatter plots of predicted versus observed LOS (Figure 3D–E) show strong calibration with tight clustering around the identity line, indicating reliable agreement between predicted and actual values across the full postoperative LOS range. Minor dispersion appears only among the longest-stay cases, reflecting natural variability in extended recovery durations. Overall, these results confirm that the proposed model accurately captures inter-patient variability in postoperative recovery time while maintaining stable generalization between holdout and cross-validation datasets.

3.2. Feature-Level Interpretability and Key Predictors

Global SHAP analysis identified operative duration as the most influential predictor of postoperative hospital LOS (days) (Figure 4). Diagnostic categories—particularly neoplasms, and respiratory diseases—also had strong positive contributions, indicating that higher feature values within these groups were consistently associated with longer hospital stays. Among laboratory indices, albumin and alkaline phosphatase (ALP) emerged as key biochemical correlates, with lower albumin and higher ALP levels linked to prolonged hospitalization. Dynamic intraoperative parameters—including mean absolute changes in urine output (ΔUO), arterial systolic pressure (ΔSBP), non-invasive blood pressure (ΔNIBP), and ventilation measures such as Δminute volume, Δrespiratory rate, and ΔEtCO_2_—further contributed to predictions, highlighting the importance of physiologic variability during surgery. Procedural categories such as breast, gastric, and hip prosthesis surgeries also ranked prominently, reflecting procedure-specific recovery trajectories. Collectively, these findings show how the model integrates operative duration, diagnostic complexity, preoperative biochemistry, and intraoperative dynamics to generate physiologically interpretable predictions of postoperative LOS.

3.3. Domain-Aware Interpretability and Hierarchical Insights

To capture higher-order structure across clinically related features, we implemented a domain-aware interpretability framework integrating leave-one-domain-out ablation and global SHAP aggregation (Figure 5A–D). Both complementary approaches revealed a consistent hierarchy of perioperative domains influencing LOS prediction.

Ablation testing showed that removing durations (OR/anesthesia) caused the largest performance drop (ΔR^2^ = 0.081), followed by diagnoses (ΔR^2^ = 0.032), intraoperative procedures (ΔR^2^ = 0.014), preoperative labs (ΔR^2^ = 0.012), and intraoperative vitals (ΔR^2^ = 0.008). Global SHAP aggregation confirmed these trends, ranking durations (0.25), diagnoses (0.20), intraoperative vitals (0.15), and preoperative labs (0.07) as the top domains. SHAP composition indicated proportional contributions of approximately 33%, 26%, 20%, and 10%, respectively.

Overall, these results emphasize that operative duration, diagnostic complexity, intraoperative physiologic variability, and preoperative laboratory status are the principal determinants of hospital LOS, while medications and ward vitals contributed minimally supporting the robustness and clinical interpretability of the domain-aware framework.

3.4. Multidomain Feature Dependencies Underlying LOS Prediction

To characterize the structural relationships among the most influential predictors of postoperative length of stay (LOS), we quantified pairwise associations across the top 25 SHAP-ranked variables. The cluster-ordered Spearman correlation matrix (Figure 6A) reveals distinct blocks of covarying features that align with physiological, demographic, diagnostic, and procedural domains. Ward hierarchical clustering further resolves these dependencies, yielding clinically coherent groupings that reflect shared biological or perioperative mechanisms (Figure 6B). This multidomain organization underscores that LOS is governed not by isolated predictors but by coordinated patterns spanning multiple facets of patient status and surgical care.

3.5. Patient-Level and Personalized Interpretability

Patient-specific SHAP waterfall plots illustrate how individual feature combinations influence LOS predictions (Figure 7A–D). For low-LOS cases, preventive contributors such as short operative duration, favorable diagnoses, stable intraoperative parameters, and normal preoperative labs were dominant (Figure 7A,C). Conversely, high-LOS cases were characterized by risk-enhancing drivers, including prolonged operative duration, intraoperative instability, and comorbid or malignant diagnoses (Figure 7B,D). These individualized SHAP explanations provide clinically intuitive narratives for both short and extended recoveries, offering potential utility for personalized perioperative planning and postoperative risk communication.

4. Discussion

We developed a domain-aware, interpretable machine-learning framework to predict postoperative hospital LOS using a large, heterogeneous perioperative cohort. By organizing perioperative data into clinically coherent domains—demographics, laboratories, intraoperative physiology, and procedures—the model achieved strong generalization across cohorts. This domain-aware structure enabled hierarchical interpretability, revealing key predictors at the feature, domain, and patient levels, and translating data-driven outputs into clinically meaningful insights for perioperative recovery.

4.1. Predictive Performance and Key Determinants of Hospital LOS

The model achieved consistent and robust predictive performance (R^2^ = 0.60), demonstrating strong calibration and generalization across both holdout and cross-validation cohorts. Operative duration emerged as the dominant determinant, reflecting its established role as a surrogate for surgical complexity, anesthetic exposure, and intraoperative resource utilization [42]. Diagnostic categories—particularly neoplasms, musculoskeletal, and respiratory disorders—were also highly influential, aligning with prior findings that link underlying disease burden and procedural type to delayed postoperative recovery [43,44]. Preoperative laboratory measures such as albumin, chloride, and hematocrit contributed significantly, consistent with biochemical markers of nutritional status, inflammation, and oxygen-carrying capacity known to affect surgical outcomes [45].

Moreover, intraoperative hemodynamic and ventilatory variability underscored the importance of physiologic stability during surgery, supporting evidence that fluctuations in arterial pressure and ventilation parameters are strong predictors of postoperative complications and extended hospitalization [46]. Although postoperative complications were not explicitly modeled to avoid temporal data leakage, these perioperative factors represent upstream determinants of complication risk and are therefore implicitly captured through their association with prolonged length of stay. Together, these results highlight the framework’s ability to integrate both procedural and dynamic physiologic signals to model recovery after surgery with clinical fidelity.

4.2. Interpretability and Clinical Relevance

By integrating feature attribution with domain-level ablation, the framework provided transparent, multi-scale interpretability. Both methods consistently identified durations, diagnoses, intraoperative vitals, and preoperative labs as the most influential domains shaping postoperative LOS [42,45,47,48]. At the feature level, operative duration, diagnostic complexity, and biochemical indices such as albumin and alkaline phosphatase emerged as dominant determinants, while intraoperative physiologic variability further modulated recovery patterns [43,44,45]. The alignment between feature- and domain-level analyses reinforces the robustness and clinical plausibility of the model’s explanations. At the patient level, individualized Shapley values narratives distinguished protective from risk-enhancing factors—linking short procedures, stable intraoperative physiology, and normal laboratory values with early discharge, and prolonged operations, hemodynamic instability, or adverse diagnoses with extended hospitalization. These interpretable relationships transform the model from a predictive algorithm into an explanatory, clinically grounded decision-support framework, strengthening confidence in ML-based perioperative risk assessment.

4.3. Comparison with Prior Work

Prior studies complement our findings by demonstrating the utility of machine-learning methods for predicting hospital resource utilization and improving operational efficiency [49,50]. For example, XGBoost can accurately predict cardiothoracic surgery duration and support data-driven capacity management by reducing delays in elective and acute surgical schedules [51]. Similarly, Light Gradient-Boosting Machine (LightGBM) models forecasts emergency department crowding and inform staffing strategies aligned with patient volume demand [52]. Gradient-boosting-based models—particularly XGBoost—achieved the highest predictive performance for postoperative length of stay (LOS), although other ensemble methods such as random forests demonstrated comparable performance [21,53]. Random forest models predicts postoperative LOS, intensive care unit admission, surgical bed utilization, and outpatient visit volumes in adult hospital populations [54]. In addition, operations research-based approaches, including integer linear programming and goal programming, have been shown to effectively optimize elective surgical scheduling and operating room utilization [20,53].

Most LOS prediction studies have often focused on narrow surgical cohorts, relied primarily on preoperative features, or lacked interpretable frameworks. Our work expands on these by demonstrating that large-scale multimodal perioperative data can be effectively harnessed in an interpretable boosting-tree framework. A key strength of the present study is the explicit use of electronic health record (EHR) domain-aware feature organization. By combining feature-level SHAP explanations with domain-level ablation analyses, we quantify not only which variables are influential, but also how entire perioperative domains—such as operative duration, diagnoses, intraoperative physiology, and preoperative laboratory status—contribute to LOS prediction. This hierarchical interpretability is particularly valuable in perioperative medicine, where data streams are inherently structured, and clinical decision-making often occurs at the EHR domain rather than individual-feature level.

Furthermore, by deliberately restricting predictors to preoperative and intraoperative data, our approach preserves temporal validity and positions LOS as a downstream outcome that implicitly reflects postoperative complications and recovery trajectories. The resulting model achieves strong and reproducible performance as an interpretable LOS decision-support tools for perioperative planning and hospital resource management.

4.4. Strength and Perspectives for Clinical Application

Key strengths of this study include the use of a large, real-world perioperative dataset, systematic feature engineering across pre-, intra-, and postoperative phases, and the integration of complementary interpretability techniques. By restricting predictors to preoperative and intraoperative information, the framework preserves clinical realism while avoiding postoperative data leakage, enabling predictions that are feasible within routine perioperative workflows. The integration of complementary interpretability techniques—feature-level SHAP attribution and domain-level ablation—provides multi-scale insight into the determinants of hospital length of stay, allowing clinicians to contextualize predictions within familiar perioperative domains rather than individual variables.

In current clinical practice, length of stay is calculated retrospectively from EHR timestamps and is therefore known only at discharge. The proposed framework shifts LOS assessment upstream by providing prospective estimates based on data available before or during surgery, enabling earlier and more informed perioperative planning. Our SHAP interpretability analyses further clarify which individual features and perioperative domains most strongly influence LOS predictions. Together, these capabilities support proactive discharge coordination, bed and operating room capacity management, and targeted allocation of postoperative resources.

4.5. Limitations and Future Directions

Limitations include potential unmeasured confounding (e.g., socioeconomic and hospital-level factors not captured in the dataset), the single-country setting which may limit generalizability, and residual noise in intraoperative signal summaries. Additionally, prediction error increased for extreme LOS outliers, suggesting that further work is needed to model rare, prolonged hospitalizations. Future studies should explore external validation across multi-institutional datasets, incorporation of additional perioperative variables such as postoperative complications and enhanced recovery protocol adherence, and integration with clinician-facing decision support tools. Deep learning models leveraging raw intraoperative waveform data may further improve accuracy, while hybrid approaches combining machine learning with mechanistic models could enhance interpretability. Ultimately, embedding interpretable LOS prediction into perioperative planning workflows may support proactive resource allocation, early discharge planning, and targeted interventions for high-risk patients.

5. Conclusions

Postoperative length of stay is a critical determinant of surgical outcomes, healthcare resource utilization, and patient recovery, yet remains difficult to predict accurately across heterogeneous surgical populations. In this large perioperative cohort of surgical cases, we developed an interpretable machine learning framework that achieved robust and generalizable prediction of postoperative hospital LOS. By combining feature-level interpretation with domain-level ablation, we identified convergent and clinically meaningful drivers of prolonged hospitalization. Case-level explanations further demonstrated how individual patient risk can be understood in a transparent manner. Collectively, these findings underscore the potential of interpretable machine learning to enhance perioperative decision-making, support resource allocation and discharge planning, and facilitate individualized risk stratification in clinical practice.

Bibliography54

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Han T.S. Murray P. Robin J. Wilkinson P. Fluck D. Fry C.H. Evaluation of the association of length of stay in hospital and outcomes Int. J. Qual. Health Care 202234 mzab 16010.1093/intqhc/mzab 16034918090 PMC 9070811 · doi ↗ · pubmed ↗
2Weissman J.S. Rothschild J.M. Bendavid E. Sprivulis P. Cook E.F. Evans R.S. Kaganova Y. Bender M. David-Kasdan J. Haug P. Hospital workload and adverse events Med. Care 20074544845510.1097/01.mlr.0000257231.86368.0917446831 · doi ↗ · pubmed ↗
3Norton S.A. Hogan L.A. Holloway R.G. Temkin-Greener H. Buckley M.J. Quill T.E. Proactive palliative care in the medical intensive care unit: Effects on length of stay for selected high-risk patients Crit. Care Med.2007351530153510.1097/01.CCM.0000266533.06543.0C 17452930 · doi ↗ · pubmed ↗
4Cerfolio R.J. Ferrari-Light D. Ren-Fielding C. Fielding G. Perry N. Rabinovich A. Saraceni M. Fitzpatrick M. Jain S. Pachter H.L. Improving operating room turnover time in a New York City academic hospital via Lean Ann. Thorac. Surg.20191071011101610.1016/j.athoracsur.2018.11.07130629927 · doi ↗ · pubmed ↗
5Hunt-O'Connor C. Moore Z. Patton D. Nugent L. Avsar P. O'Connor T. The effect of discharge planning on length of stay and readmission rates of older adults in acute hospitals: A systematic review and Meta-Analysis of systematic reviews J. Nurs. Manag.2021292697270610.1111/jonm.1340934216502 · doi ↗ · pubmed ↗
6Sauro K.M. Smith C. Ibadin S. Thomas A. Ganshorn H. Bakunda L. Bajgain B. Bisch S.P. Nelson G. Enhanced recovery after surgery guidelines and hospital length of stay, readmission, complications, and mortality: A meta-analysis of randomized clinical trials JAMA Netw. Open 20247 e 2417310-1010.1001/jamanetworkopen.2024.1731038888922 PMC 11195621 · doi ↗ · pubmed ↗
7Brasel K.J. Lim H.J. Nirula R. Weigelt J.A. Length of stay: An appropriate quality measure?Arch. Surg.200714246146610.1001/archsurg.142.5.46117515488 · doi ↗ · pubmed ↗
8Hornung A.L. Rudisill S.S. Mc Cormick J.R. Streepy J.T. Harkin W.E. Bryson N. Simcock X. Garrigues G.E. Preoperative factors predict prolonged length of stay, serious adverse complications, and readmission following operative intervention of proximal humerus fractures: A machine learning analysis of a national database JSES Int.2024869970810.1016/j.jseint.2024.02.00539035667 PMC 11258835 · doi ↗ · pubmed ↗