# Machine Learning Modeling of Hospital Length of Stay After Breast Cancer Surgery: Comparison of Random Forest and Linear Regression Approaches

**Authors:** Iulian Slavu, Raluca Tulin, Alexandru Dogaru, Ileana Dima, Cristina Orlov Slavu, Daniela-Elena Gheoca Mutu, Adrian Tulin

PMC · DOI: 10.3390/medicina62010088 · Medicina · 2025-12-31

## TL;DR

This study compares machine learning models to traditional methods for predicting hospital stays after breast cancer surgery, finding that machine learning improves accuracy and provides insights into factors affecting recovery.

## Contribution

The study demonstrates that Random Forest outperforms traditional regression and Gradient Boosting in predicting hospital length of stay after breast cancer surgery.

## Key findings

- Random Forest regression achieved the lowest prediction error (MAE 2.31 days; RMSE 2.82; R2 = 0.37) compared to other models.
- Breast-conserving surgery had the shortest LOS (median 3 days), while mastectomy with immediate reconstruction had the longest (median 8 days).
- Key predictors of LOS included age, surgical complexity, reconstruction modality, BMI, implant capacity, and tumor burden.

## Abstract

Background and Objectives: Hospital length of stay (LOS) after breast cancer surgery is a key indicator of postoperative recovery, healthcare quality, and hospital resource utilization. Traditional statistical approaches have identified general correlates of LOS but remain limited in predictive accuracy, particularly in heterogeneous real-world surgical populations. Machine learning (ML) models may offer improved performance by capturing nonlinear interactions among clinical, pathological, and operative factors. This study aimed to evaluate ML algorithms for LOS prediction and to identify determinants of prolonged hospitalization in a contemporary breast cancer cohort. Materials and Methods: We conducted a retrospective cross-sectional study of 198 consecutive breast cancer patients who underwent surgery between January 2022 and December 2023 at a single tertiary care center. Clinical, pathological, and surgical data were extracted from electronic medical records. Three regression models—multiple linear regression, Random Forest, and Gradient Boosting—were trained to predict continuous LOS, and three classification models were applied to prolonged LOS (≥10 days). Model performance was assessed using mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2), and area under the curve (AUC). Feature importance was analyzed for the best-performing model. Results: The median LOS was 7 days (IQR 5–10), ranging from 1 to 26 days. Breast-conserving surgery showed the shortest LOS (median 3 days), while mastectomy with immediate reconstruction resulted in the longest stays (median 8 days). Random Forest regression achieved the lowest prediction error (MAE 2.31 days; RMSE 2.82; R2 = 0.37), outperforming Gradient Boosting and substantially surpassing linear regression (MAE 8.63 days; R2 = –8.17). Key predictors included age, surgical complexity, reconstruction modality, BMI, implant capacity, and tumor burden. Classification models yielded modest AUCs (0.545–0.589) with low sensitivity, indicating limited discriminative performance for dichotomized LOS outcomes. Conclusions: Machine-learning models, particularly Random Forest, substantially improve LOS prediction compared with classical regression and provide clinically meaningful insights into the drivers of hospitalization after breast cancer surgery. Continuous LOS modeling is more informative than binary thresholds. These findings support integrating ML-based tools into perioperative planning, resource allocation, and patient counseling in breast surgical care.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** tumor (MESH:D009369), Breast Cancer (MESH:D001943), mastectomy (MESH:D000072656)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12843231/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12843231/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12843231/full.md

---
Source: https://tomesphere.com/paper/PMC12843231