Prediction models of adverse outcomes following surgery and radiotherapy for breast cancer: a systematic review
H. Asfour, B. Wang, H. Zhou, A. Al Janapy, N.G. Patel, R.P. Symonds, C.J. Talbot, T. Rattay

TL;DR
This paper reviews models that predict side effects from breast cancer surgery and radiotherapy, finding most are not yet ready for clinical use due to lack of validation.
Contribution
The study systematically reviews prediction models for adverse outcomes in breast cancer treatment, highlighting gaps in external validation and long-term effect prediction.
Findings
Most prediction models lack external validation, limiting their clinical use.
Few models predict long-term effects like breast appearance and quality of life.
Machine learning models show promise for complex endpoints.
Abstract
Breast surgery and radiotherapy are the most common treatment modalities for breast cancer, although both may have side-effects that can affect quality of life. Being able to identify patients at risk of adverse outcomes would enable optimisation of individualised treatment plans to improve the experience of breast cancer survivors. A systematic review of prediction models for adverse outcomes following surgery and radiotherapy for breast cancer was conducted. PubMed, Medline, Scopus, Web of Science, and CINAHL databases were searched using relevant key words and Medical Subject Heading terms. The search yielded 5376 articles, of which 33 articles were included. Data were extracted on study design, sources of training and test/validation data, predictors, outcomes, model performance, and validation. Several prediction models for adverse outcomes following breast surgery with or without…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBreast Cancer Treatment Studies · Prostate Cancer Diagnosis and Treatment · Advances in Oncology and Radiotherapy
Introduction
Breast cancer is considered the most common type of cancer in females globally. However, breast cancer prognosis has significantly improved over the past four decades with ∼90% of patients now surviving ≥5 years. This has been attributed to advances in breast cancer screening, earlier detection, and treatment. Surgery, including breast-conserving surgery (BCS) and post-mastectomy breast reconstruction, and radiotherapy are the two most commonly used treatments for breast cancer. At the same time, both treatment modalities may cause side-effects that can adversely impact patients’ quality of life (QoL).1
Prediction models, including those based on machine learning (ML) techniques, are nowadays considered valuable tools in health care, offering the potential to forecast patients’ outcomes and tailor precision treatment plans accordingly.2 In the context of breast cancer surgery and radiotherapy, prediction models can help identify individuals or cohorts of patients who are at increased risk of adverse effects or unfavourable aesthetic outcomes. By utilising individual patient- and treatment-related data, prediction models can enhance informed decision making, leading to optimised treatment outcomes and improved patient QoL.3, 4, 5, 6
Nevertheless, the development and validation of reliable clinical prediction models poses several challenges, such as the heterogeneity of patient populations, variations in treatment approaches and protocols, and differences in scaling or classifying adverse outcomes. Therefore, the integration of diverse datasets and application of appropriate methodology are essential to capturing the complex interactions between various predictive variables.7 Anecdotally, researchers have used a variety of modelling approaches, from traditional statistical approaches such as logistic regression (LR) models to ML algorithms like random forests (RF), support vector machines (SVM), and gradient boosting regression, to investigate a range of predictors for outcomes such as capsular contracture, skin radiotoxicity, surgical site infection (SSI), and breast shape deformation.8, 9, 10, 11 Oleck et al. (2022)12 conducted a scoping review of predictive risk calculators for post-mastectomy reconstruction, which included 28 models with varying degrees of predictive performance and accuracy.
The aim of this paper was to extend previous work to systematically review the literature on prediction models for adverse outcomes in all breast cancer local treatment settings, i.e. surgery and radiotherapy, to synthesise the available evidence and evaluate the performance of existing models across the whole range of local treatment-related side-effects, in order to inform clinical practice and future research directions.
Methodology
This review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines.13 It was based on the following questions: ‘what are the available prediction models for adverse outcomes following breast cancer surgery and radiotherapy’, ‘whether the existing models are accurate’, and ‘what is the clinical benefit’. The review protocol was registered (PROSPERO CRD420251017394).
Inclusion and exclusion criteria
Studies were included if they reported on prediction models of breast appearance, outcome, complications, or toxicity following surgery or radiation treatment in women aged ≥18 years, assessing model performance including accuracy, calibration, and utility in clinical practice. Any study design, including observational studies or studies using clinical trial data, was included. Studies not reporting on prediction and those reporting breast assessment methods or scales only were excluded, as were conference abstracts, reviews, letters, or commentaries.
Search strategy
A systematic literature search was carried out according to PRISMA guidelines in five electronic databases: PubMed, Medline (Ovid), Scopus, Web of Science, and CINAHL, using key words coupled with the relevant Medical Subject Heading terms, both of which were linked by either of the Boolean operators (Supplementary Table S1, available at https://doi.org/10.1016/j.esmorw.2026.100690), limited to English language publications and human subjects, but not limited by date of publication. Additional articles were identified through hand searches of the reference lists of included studies. The search covered studies published up to 15 June 2024.
Data extraction
Abstracts of published studies were screened independently by two reviewers (HA, TR). Where there was disagreement between the reviewers, a third author was consulted (CJT). Data from eligible studies were extracted in a standardised format and quality-assessed according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis + Artificial Intelligence (TRIPOD+AI) statement checklist14 (Supplementary Table S2, available at https://doi.org/10.1016/j.esmorw.2026.100690), which include items relevant to prediction model studies in terms of study design, sources of training and test/validation data, predictors, outcomes, data handling, model performance, limitations, and generalisability.
Data synthesis and analysis
Due to the heterogeneity between studies, data synthesis was primarily qualitative, and a meta-analysis was not feasible. Included models are presented according to whether they predicted early or long-term adverse events in tabular format with summary statistics. Early adverse outcomes are defined as those that occur within the first 3 months (90 days) following treatments, and include wound healing issues, SSI, implant or flap failure, and acute radiotherapy skin reactions (erythema and desquamation). Long-term or late adverse outcomes may occur after 3 months up to many years following treatment, such as breast fibrosis (scarring), skin pigmentation changes and telangiectasia (dilated blood vessels under the skin), breast atrophy (volume loss), lymphoedema, and capsular contracture.
Results
Study selection
The database and hand searches identified 13 307 records. Following removal of duplicates, a total of 5376 records proceeded to abstract screening. Before screening, 294 records were excluded as they were conference abstracts, reviews, or commentaries, and 4682 studies were excluded after screening their title and abstract, leaving 386 studies which were assessed for eligibility in full text. After excluding 118 studies that described breast assessment models or scales rather than prediction models, 233 studies that described risk factors and outcomes but lacked any model development, and 2 studies that involved non-human subjects, 33 studies were included in the final data synthesis (Figure 1).Figure 1. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart showing the process of systematic literature record identification, screening, and eligibility.
Study characteristics and results of individual studies
Individual studies that fulfilled the inclusion criteria were divided into two main categories: prediction models for (i) early and (ii) long-term adverse outcomes. Their main characteristics and results, including predictive performance, are described in the following sections:
Prediction models for early adverse outcomes
Radiation-induced early skin toxicity
Several studies described prediction models of breast skin toxicity following radiotherapy. Rattay et al. (2020)6 successfully developed a prediction model for skin erythema [Common Terminology Criteria for Adverse Events (CTCAE) grade ≥2] in three combined observational datasets and externally validated the model using data from the multicentre REQUITE cohort study,15 demonstrating that their model could predict this type of acute skin radiation toxicity with an area under the curve (AUC) of 0.65, though with moderate calibration (Brier score 0.17). However, the study encountered challenges when attempting to validate a prediction model for the related endpoint desquamation, defined as CTCAE grade ≥3 radiation dermatitis/moist desquamation or CTCAE grade ≥1 skin ulceration according to the CTCAE (2017).16 Despite using the same REQUITE data, their model for desquamation failed to achieve satisfactory validation, indicating that predicting desquamation may require a different modelling approach.
Aldraimli et al. (2022)17 leveraged the same REQUITE cohort to focus on developing a new ML model to predict acute desquamation. They employed various ML algorithms, with the cost-sensitive RF model emerging as the most effective, achieving a sensitivity of 0.77, a specificity of 0.66, and an AUC of 0.77. This model underwent internal validation but still requires external validation to confirm its robustness across different patient populations (Table 1). Feng et al. (2022)18 developed a radiomics-based ML model for predicting radiation-induced acute skin toxicity (CTCAE grade ≥2) in breast cancer patients. This study follows the trajectory set by Rattay et al. (2020)6 and Aldraimli et al. (2022)17 in addressing the prediction of radiation-induced skin toxicity with more advanced methodologies. While Rattay et al. (2020)6 successfully validated a model for erythema but struggled with desquamation prediction, and Aldraimli et al. (2022)17 developed a ML model focusing on desquamation, Feng et al. (2022)18 further advanced the field by integrating radiomics features extracted from planning computed tomography images with clinical and dosimetric data. Their model, using a gradient boosting decision tree algorithm, achieved an excellent AUC of 0.998 in the training set and 0.911 in the validation set (Table 1).Table 1. Summary of the studies evaluating predictive models for radiation-induced breast skin early toxicityStudyNo. of patientsAgeInterventionFollow-up timePredicted outcomeSelected predictorsData sourceType of model developedModel specification/algorithms usedModel validationModel performance/metricsRattay et al. (2020)62031Median age 58 yearsBCS + EBRT ± bed boost≤90 daysBreast radiation-induced dermatitis (erythema)Patient, clinical, and treatment factorsReal world (REQUITE database)Prediction model for acute erythemaMultivariate analysisBootstrapping (100) for internal validation. Externally validated.AUC: 0.65, Brier score: 0.17Aldraimli et al. (2002)172058Median age 58 yearsBCS + EBRT ± bed boost≤90 daysAcute desquamation: CTCAE grade ≥3 radiation dermatitis (moist desquamation) or CTCAE grade ≥1 skin ulcerationPatient, clinical, and treatment factorsReal world (REQUITE database)Acute desquamation risk score8 ML models [naive Bayes (NB), logistic regression with ridge estimator, artificial neural networks with a multilayer perceptron architecture, support vector machine with polynomial kernel and logistic calibrator, K-nearest neighbour with K = 1, 3, 5, 7, 9, decision trees (C4.5), logistic model tree (LMT), and RF]Internal validation using validation cohort—train–test split (50%–50%) and 10-fold cross-validation to reduce overfitting. No external validation.AUC: 0.77, model sensitivity: 0.77, and specificity: 0.66Feng et al. (2022)18214N/ABreast cancer surgery + RTN/ART-induced acute skin toxicity (grade ≥ 2)/radiodermatitis(i) Laterality, (ii) quadrant position, (iii) histological type, (iv) T stage, (v) PR, (vi) hormone therapy, (vii) EQD2_all, (viii) Lotion applicationReal worldRisk assessment modelMultivariate logistic regression gradient boosting decision tree (GBDT)Cross-validation (five-fold) for internal validation (75% training and 25% validation). No external validation.AUC for clinical and dosimetric features: 0.839 (training), 0.816 (validation)AUC for radiomic features: 0.998 (training), 0.911 (validation)Cilla et al. (2023)19129N/ABreast cancer surgery ± reconstruction + adjuvant WBR≤6 monthsRT-induced skin toxicity (RTOG ≥2)(i) Spectrophotometric variables at time T0, (ii) BMI, (iii) PTV1, (iv) PTV2, (v) the dose fractionation schemeReal worldRisk assessment model3 ML models: SVM, CART, and LRCross-validation (five-fold) 25% training and 75% validation and Akaike information criterion (AIC) for internal validation. No external validation.AUC: 0.664-0.816SVM has best performance F-score of 88.7%-98.6%AUC, area under the curve; BMI, body mass index; CART, classification and regression tree; CTCAE, Common Terminology Criteria for Adverse Events; EBRT, external beam radiotherapy; LR, logistic regression; ML, machine learning; N/A, not applicable; PR, progesterone receptor; PTV, planning target volume; RF, random forest; RT, radiotherapy; RTOG, Radiation Therapy Oncology Group; SVM, support vector machine; TE, tissue expander; WBR, whole breast radiotherapy.
Cilla et al. (2023)19 developed another predictive model for acute skin toxicity (grade ≥ 2) in breast cancer patients undergoing radiotherapy. In this study, quantitative spectrophotometric markers—melanin and erythema indices—were integrated with clinical variables to predict radiation-induced skin toxicity. Several statistical and ML models were developed, including LR, SVM, and classification and regression tree analysis, with the SVM model using the radial basis function kernel showing the best performance. This model achieved an accuracy of 89.8%, a precision of 88.7%, a recall of 98.6%, and an F-score of 93.3%. The study showed the potential of using spectrophotometry as a non-invasive tool for predicting skin toxicity and may offer a practical and interpretable alternative to other more complex methods and radiomics (Table 1). However, it is important to note that none of these models apart from Rattay et al. (2020)6 have been externally validated either in a temporally or a spatially segregated dataset, and there are as yet no models for early radiotherapy skin toxicity incorporating germline genomic data.
Breast reconstruction complications. Kim et al. (2014; 2015)20^,^21 and Khavanin et al. (2017)22 described the development and validation of the Breast Reconstruction Risk Assessment (BRA) score, a model designed to predict the likelihood of complications such as SSI, seroma, flap failure, or explantation (implant loss) following autologous and implant-based immediate breast reconstruction (IBR). Kim et al. (2014)20 initially developed and internally validated the BRA score using a large dataset from the American College of Surgery (ACS) National Surgical Quality Improvement Program (NSQIP). This work was further extended in Kim et al. (2015),21 where additional outcomes from the Tracking Operations and Outcomes for Plastic Surgeons (TOPS) database were integrated. Khavanin et al. (2017)22 conducted an external validation of the BRA score in a separate cohort, comprising two-stage IBR only, showing good calibration for SSI and seroma prediction, but not so much for implant loss, with moderate to good performance in terms of AUC, ranging from 0.69 to 0.78 depending on the outcome (Table 2).Table 2. Summary of BRA-related studies evaluating predictive models for post-operative complications in breast reconstructive surgeryStudyNo. of patientsAgeInterventionFollow-up timePredicted outcomeSelected predictorsData sourceType of model developedModel specification/algorithms usedModel validationModel performance/metricsKim et al. (2014)2016 069N/AIBR (implant and autologous)≤30 daysIndividualised SSIAge, weight, height, ASA class, BMI, smoking, radiation, chemotherapy, hypertension, DM, clotting disorder, anticoagulation, CAD, PAD, dyspnoea, bilaterality of reconstructionReal-world NSQIP databaseApplication of the BRA risk calculator for SSIMultiple logistic regression modelBootstrapping (1000 samples) for internal validation. Externally validated in different cohorts.HL test P - 0.371Brier score: 0.0357C-statistic: 0.682Kim et al. (2015)214439N/AIBR (implant and autologous)≤30 daysSeroma, dehiscence, SSI, explantation, flap failure, reoperation, and overall complicationsAge, BMI, current smoker, smoking, DM, ASA > 2Real-world TOPS databaseExtension of BRA model to include plastic surgery outcomesMultiple logistic regression modelBootstrapping (1000 samples) for internal validation. No external validation.Corrected C-statistics 0.603-0.677 (0.699 uncorrected)HL test P = 0.167-0.609Brier score: 0.007-0.063Khavanin et al. (2017)22855 (1333 breasts)N/ATwo-stage IBR (TE/implant)≤30 daysSSI and seromaAge, weight, height, ASA class, BMI, smoking, radiation, chemotherapy, hypertension, DM, clotting disorder, anticoagulation, CAD, PAD, dyspnoea, bilaterality of reconstructionReal-time NSQIP and TOPS datasetsEvaluation of the BRA for SSI and seromaMultiple logistic regression modelExternal validation study of the BRA score model in two-stage IBR (TE/implant).HL test P = 0.16-0.33Brier score: 0.95-2.25%Blough et al. (2018)23903 (1365 breasts)N/ATwo-stage IBR (TE/implant)≤1 yearSSI, seroma, dehiscence/implant exposure, explantationAge, weight, height, ASA class, BMI, smoking, radiation, chemotherapy, hypertension, DM, clotting disorder, anticoagulation, CAD, PAD, dyspnoea, bilaterality of reconstructionReal worldBRA score enhanced beyond 30 days up to 1 year (BRA XL)Five multiple logistic regression models (one for each complication plus one for overall complication)Internal validation using key statistics. No external validation.C-statistics: 0.674-0.739HL tests: uniformly non-significantBrier scores: 0.027-0.154Hansen et al. (2018)24903 (1365 breasts)N/ATwo-stage IBR (TE/implant)≤1 yearMastectomy skin flap necrosis (MSFN), SSI, seroma, dehiscence/implant exposure, explantationAge, weight, height, ASA class, BMI, smoking, radiation, chemotherapy, hypertension, DM, clotting disorder, anticoagulation, CAD, PAD, dyspnoea, bilaterality of reconstructionReal worldBRA score enhanced beyond 30 days up to 1 year (BRA XL)Five multiple logistic regression models (one for each outcome)Internal validation using key statistics. No external validation.C-statistics: 0.674-0.739HL tests: uniformly non-significantBrier scores: 0.027-0.154O’Neill et al. (2019)25415N/AMicrovascular IBR≤30 daysSurgical complications, medical complications, reoperation, and total or partial flap failureAge, weight, height, ASA class, BMI, smoking, radiation, chemotherapy, hypertension, DM, clotting disorder, anticoagulation, CAD, PAD, dyspnoea, bilaterality of reconstructionReal worldEvaluation of BRA score for microvascular IBRMultiple logistic regression modelExternal validation of BRA in microvascular IBR.C-statistics: 0.49-0.59Brier scores: 0.09-0.44O’Neill et al. (2020)261012N/AIBR + DBR (DIEP flap reconstruction)N/AFlap failurePatient (age, BMI, comorbidities, smoking history), treatment (timing and laterality of reconstruction, history of radiation) factorsReal worldRisk assessment modelMachine learning resampling and decision tree classification modelsInternal validation using validation cohort—train–test split (60%–40%). No external validation.AUC: 0.67Roy et al. (2019)27351N/AIBR (DIEP flap)≤90 daysPerioperative complications (microsurgical, surgical, and medical)(i) BMI, (ii) prior radiotherapy, (iii) active smoking, (iv) comorbidity, (v) bilateral reconstruction, (vi) prior chemotherapy, (vii) age ≥ 65 years, (viii) prior hormonal therapyReal worldRisk assessment modelMultivariable logistic regressionInternal validation using validation cohort of 100 patients—train–test split (71.5%–28.5%). No external validation.C-statistic: 0.6HL test P > 0.05Martin et al. (2020)28247Average age 49.2 yearsIBR (pre-pectoral expander)≤30 daysSSI requiring i.v. abx or admission, seroma requiring drainage, dehiscence, explantation, skin necrosis, other expander-related complicationsAge, BMI, current smoker, smoking, DM, ASA > 2Real worldApplication of BRA model in pre-pectoral expander IBR patientsAs described in the original development of BRA modelExternal validation in patients undergoing pre-pectoral expander IBR.BRA has poor predictive power in pre-pectoral breast reconstructionAbx, antibiotics; ASA, American Society of Anesthesiologists; BMI, body mass index; BRA, Breast Reconstruction Risk Assessment; CAD, coronary artery disease; DBR, delayed breast reconstruction; DIEP, deep inferior epigastric perforator; DM, diabetes mellitus; IBR, immediate breast reconstruction; i.v., intravenous; IBR, immediate breast reconstruction; N/A, not applicable; NSQIP, National Surgical Quality Improvement Program; PAD, peripheral artery disease; SSI, surgical site infection; TE, tissue expander; TOPS, Tracking Operations and Outcomes for Plastic Surgeons.
Blough et al. (2018)23 and Hansen et al. (2018)24 validated and modified the BRA score, which was initially developed to predict 30-day complications from breast reconstruction surgery, to predict 1-year complications after implant-based breast reconstruction, including SSI, implant loss, and seroma formation (BRA score XL). Using their own institutional database, they observed that less than a third of all complications occurred in the initial 30-day window following surgery. Reported AUCs for these extended models ranged from 0.661 to 0.739 and Brier scores from 0.027 to 0.154 (Table 2).
O’Neill et al. (2019)25 tested the ability of the BRA score to predict complications from microvascular breast reconstruction, specifically focusing on the deep inferior epigastric perforator (DIEP) flap procedures. They found that while the BRA score provided had some clinical utility, it underperformed in predicting medical complications, flap failure, and donor-site morbidity, complications which are specific to microvascular free-flap surgery with its longer operating time. AUCs for four models ranged from 0.49 to 0.59 with Brier scores from 0.09 to 0.44, indicating a lack of validation for microvascular reconstruction. This motivated the authors to develop a new model based on an ensemble ML decision tree for medical and surgical complications following DIEP flap reconstruction albeit with a moderate average AUC of 0.67 in their internal validation (test) cohort (O’Neill et al., 2020).26 In the same surgical setting of microvascular reconstruction, Roy et al. (2019)27 developed and validated a categorical prediction model using data from the same institution, stratifying patients into low-, intermediate-, and high-risk groups (Table 2), with an AUC of 0.70.
Martin et al. (2020)28 attempted to validate the 30-day BRA score for pre-pectoral implant-based reconstruction, a technique gaining popularity due to its less invasive nature compared with submuscular implant placement. Notably, half of the 30-day complications observed in their validation cohort were due to skin necrosis, a complication not included in the original BRA score model. For occurrence of any of the BRA score-predicted complications, the model remained well calibrated though discriminated poor with AUC <0.60. Further modification and external validation of the BRA score for these surgical settings are awaited (Table 2).
ACS NSQIP—Surgical Risk Calculator
Several publications have described the development and validation of the Surgical Risk Calculator (SRC), an LR model based on the ACS NSQIP dataset. Fischer et al. (2013)29 developed and internally validated a categorical prediction model for complications following autologous and implant-based IBR, classifying patients into four risk groups (low, intermediate, high, very high), with good predictive performance. However, their model only predicts the risk of any complication rather than distinct complications (Table 3). It has not yet been externally validated.Table 3. Summary of ACS NSQIP-related studies evaluating predictive models for post-operative complications in breast surgeryStudyNo. of patientsAgeInterventionFollow-up timePredicted outcomeSelected predictorsData sourceType of model developedModel specification/algorithms usedModel validationModel performance/metricsFischer et al. (2013)2912 129N/AIBR (implant and autologous)≤30 daysComposite post-operative complication(i) Obesity, (ii) autologous reconstruction, (iii) active smoking, (iv) ASAReal-world NSQIP databaseRisk assessment scale (IBRRAS) (four risk groups: low, intermediate, high, very high)Multivariate logistic regression for IBR composite risk score model development by assigning rounded odds ratios to each variable and summing risk factors for each patientInternal validation using validation cohort—train–test split (2 : 1). Externally validated in different cohorts.No significant difference between the model cohort and validation cohort (P > 0.05)O’Neil et al. (2016)30515 (759 breast)N/AAbdominal autologous breast reconstruction≤30 daysAny complications, serious complicationsAge, functional status, emergency case, ASA, wound class, steroid use, ascites, sepsis, use of ventilator, disseminated cancer, diabetes, hypertension, previous cardiac event, CHF, dyspnoea, smoker, COPD, dialysis, AKI, BMIReal-world NSQIP databaseEvaluation of ACS SRC in breast free-flap reconstructionBivariate analysis was carried out to compare overall rate of predicted risk of complications with the observed risk of complicationsExternal validation of ACS SRC is attempted in patients with autologous breast reconstruction.Hosmer–Lemeshow test was non-significantAUC or C-statistics: 0.548Brier score was higher than that reported in the original ACS calculator development (0.094 versus 0.069)Gonzalez-Woge et al. (2021)31385N/ADifferent breast cancer surgeries≤30 daysAny complications, serious complicationsAge, functional status, emergency case, ASA, steroid use, sepsis, use of ventilator, disseminated cancer, diabetes, hypertension, heart failure, dyspnoea, smoker, COPD, dialysis, AKI, BMIReal-time INCan databaseEvaluation of ACS SRC for breast cancer surgery in a Mexican cohortMultivariate logistic regressionExternal validation of ACS SRC is attempted in Mexican patients.AUC: 0.617 for any complication, 0.682 for serious complicationsHosmer–Lemeshow test significant (<0.05) for both outcomesBrier scores were 0.102 for any complication and 0.048 for serious complicationDube et al. (2022)10210N/ABreast cancer (primary or recurrent) surgery ± reconstruction≤30 daysSSI, serious post-op complicationAge, functional status, emergency case, ASA, steroid use, sepsis, use of ventilator, disseminated cancer, diabetes, hypertension, heart failure, dyspnoea, smoker, COPD, dialysis, AKI, BMIReal-world NSQIP databaseEvaluation of ACS SRC for breast cancer surgery in English cohortUnivariate logistic regression modelsExternal validation of ACS SRC is attempted in English patients.SSI and serious complications prediction: moderate accuracySSI, AUC of 79.4%, sensitivity of 63.6%, and specificity of 91.7%Jonczyk et al. (2021)32163 613N/A1 of 5 procedures (partial mastectomy, total mastectomy, implant/TE reconstruction, or free-flap reconstruction)≤30 daysAcute post-operative complications (infectious, hematologic, internal organ, and overall complications)Age, race, ethnicity, BMI, smoking status, glucocorticoid or anticoagulation use, unintentional weight loss, DM, hypertension, dyspnoea, COPD, CHF, diagnosis, stage 4 metastatic cancer, surgeon specialty, type of anaesthesia, axillary lymph node management, preoperative functional status, anaesthesia type, transfer status, admission status, and admission quarterReal-world NSQIP databaseThe Breast Cancer Surgery Risk Calculator (BCSRC)Four multivariate logistic regression models (one for each endpoint) for risk calculator model developmentBootstrap resampling (300 times) used for internal validation. Externally validated by Jonczyk et al. (2023).33AUC: overall, 0.70, infectious 0.67, hematologic 0.84, and internal organ 0.74Accuracy (Brier scores): overall 0.05-0.04, infectious 0.04-0.03, internal organ 0.006-0.003, and hematologic 0.012-0.009Model calibration using the Hosmer–Lemeshow statistic found all P > 0.05Jonczyk et al. (2023)3360 144N/A1 of 5 procedures (partial mastectomy, total mastectomy, implant/TE reconstruction, or free-flap reconstruction)≤30 daysAcute post-operative complications (infectious, hematologic, internal organ, and overall complications)Age, race, ethnicity, BMI, smoking status, glucocorticoid or anticoagulation use, unintentional weight loss, DM, hypertension, dyspnoea, COPD, CHF, diagnosis, stage 4 metastatic cancer, surgeon specialty, type of anaesthesia, axillary lymph node management, preoperative functional status, anaesthesia type, transfer status, admission status, and admission quarterReal-world NSQIP databaseThe Breast Cancer Surgery Risk Calculator (BCSRC)Four multivariate logistic regression models (one for each endpoint) for risk calculator model developmentExternal validation for Breast Cancer Surgery Risk Calculator (BCSRC) for post-operative complications.AUC during external validation for each model was ∼0.70Accuracy or Brier scores were all between 0.04 and 0.003Model calibration using the Hosmer–Lemeshow statistic found all P > 0.05ACS SRC, American College of Surgeons Surgical Risk Calculator; AKI, acute kidney injury; ASA, American Society of Anesthesiologists; AUC, area under the curve; BMI, body mass index; CHF, congestive heart failure; COPD, chronic obstructive pulmonary disease; IBR, immediate breast reconstruction; N/A, not applicable; NSQIP, National Surgical Quality Improvement Program; SSI, surgical site infection.
External validation of the ACS NSQIP SRC was attempted in unselected Mexican and English breast surgical patient cohorts. Gonzalez-Woge et al. (2021)31 and Dube et al. (2022)10 found that SRC under-predicted complications with moderate discrimination (AUC 0.617 for any complication and 0.682 for serious complications), whereas the results were more promising in Dube et al. (2022)10 with an AUC of 0.794 for SSI and 0.845 for serious complications, respectively (Table 3). O’Neill et al. (2016)30 focused their investigation on validating the tool for microvascular reconstruction with relatively poor performance (AUC 0.548). The authors also noted an absence of complications specific to free-flap reconstruction, such as flap failure, from the SRC model (Table 3).
In two studies by Jonczyk et al. (2021; 2023)32^,^33 using a larger NSQIP dataset of patients treated between 2005 and 2018, the SRC was re-trained and re-calibrated for patients undergoing either BCS or mastectomy to predict four composite outcomes: overall, infectious, hematologic, and internal organ complications, and validated in a more recent (2018-2020) NSQIP dataset with an overall moderate average AUC of 0.70 and Brier scores between 0.04 and 0.003 depending on predicted outcome. The validated and updated models are available on the Breast Cancer Surgery Risk Calculator (BCSRC) platform (www.breastcalc.org) (Table 3).
Other models. Nelson et al. (2015)34 developed a categorical model in a single-institutional dataset of patients undergoing autologous breast reconstruction with three risk groups, showing that high-risk patients have an 86% risk of wound healing complications, compared with a 33% risk in patients with few risk factors, although data on model performance were not presented. Park et al. (2020)35 developed a model and risk score for overall complications in two-stage IBR with an AUC of 0.732 and 0.731, respectively. Frey et al. (2020)36 developed a model for overall complications in nipple-sparing mastectomy in a single-institution dataset with an AUC of 0.668 in their split internal validation cohort. All these models included smoking and body mass index (BMI) (obesity) among the predictors (Table 4).Table 4. Summary of studies evaluating prediction models for delayed wound healing following breast surgeryStudyNo. of patientsAgeInterventionFollow-up timePredicted outcomeSelected predictorsData sourceType of model developedModel specification/algorithms usedModel validationModel performance/metricsNelson et al. (2015)34682 (1033 breast)N/AFree autologous reconstruction3 weeksBreast and donor site delayed wound healing (wounds requiring dressing changes for > 3 weeks)Class I-III obesity, current and past smoking, bilateral reconstruction, and receipt of any vasopressor during reconstructionReal worldRisk assessment model (three risk groups: low, intermediate, and high)Multivariate logistic regressionBackward stepwise bootstrap regression (1000 random samples) for internal validation. No external validation attemptedN/APark et al. (2020)35619 (653 breast)N/ATwo-stage IBR (TE/implant)6 monthsOne or more (seroma, hematoma, infection, mastectomy flap necrosis ‘required debridement’, delayed wound healing, reconstruction failure, revision surgery)Smoking history, radiotherapy, and a final inflation volume of ≥450 mlReal worldRisk assessment modelMultivariate analysis—stepwise logistic regressionInternal validation using key statistics. No external validation attemptedAUC: 0.732 and 0.731 for the logistic regression model and risk-scoring system, respectively (P = 0.975). P values non-significantFrey et al. (2020)361070N/AIBR with nipple-sparing mastectomyN/AOverall complications (including delayed wound healing)Age, active smoking, DM, BMI, therapeutic mastectomy, prior chemotherapy, prior radiation, adjuvant radiation, adjuvant chemotherapy, mastectomy weight, mastectomy incision, reconstruction typeReal worldRisk assessment modelMultivariate logistic regressionInternal validation using validation cohort train–test split (50.2%-49.8%). No external validationAUC: 0.668AUC, area under the curve; BMI, body mass index; DM, diabetes mellitus; IBR, immediate breast reconstruction; N/A, not applicable; TE, tissue expander.
Prediction models for long-term adverse outcomes
Radiation-induced late toxicity
Mbah et al. (2018)37 developed a prediction model for the long-term radiation toxicity endpoints oedema, fibrosis, retraction, and pigmentation in breast cancer patients undergoing radiotherapy. They modelled overall patient radiosensitivity and multiple individual toxicity endpoints simultaneously using LR based on maximum likelihood estimators (MLEs). MLEs of a given predictive variable were further improved by combining other MLEs for the same variable for different toxicity endpoints, called James–Stein estimator (JSE), resulting in a lower mean squared error. Based on the JSE, 19 variables were included in their prediction model, including breast volume, chemotherapy, age, nodal irradiation, and candidate genetic variants. This study included data and genotypes from 269 patients and is yet to be externally validated (Table 5). Hammer et al. (2017)4 developed a dosimetric prediction model for CTCAE grade ≥2 radiation-induced subcutaneous fibrosis in the boost area in patients undergoing three-dimensional conformal radiotherapy with a simultaneous integrated boost technique for early-stage breast cancer, with patient age, the volume of the breast receiving >55 Gy (V55), and the maximum radiation dose (Dmax) as predictors. This model demonstrated moderate predictive performance with an AUC of 0.66. However, it still requires external validation, including for patients treated with other fractionation schedules (Table 5).Table 5. Summary of studies evaluating prediction models for breast long-term radiotoxicity and adverse cosmetic outcomesStudyNo. of patientsAgeInterventionFollow-up timePredicted outcomeSelected predictorsData sourceType of model developedModel specification/algorithms usedModel validationModel performance/metricsMbah et al. (2018)37269N/ABCS + WBI2 yearsLate radiotoxicity (five endpoints: oedema, retraction, fibrosis, pigmentation, BCCT.core)Breast volume, chemotherapy, older age, and SATB2 rs2881208 SNPReal worldRisk assessment modelMLE- and JSE-based models100 rounds of a five-fold cross-validation for internal validation. No external validation.Accuracy: JSE: 66% correct classification. MLE: 55% correct classificationHammer et al. (2017)4546Median age 65 yearsBCS + 3D-CRT-SIB5 yearsGrade ≥2 radiation-induced fibrosis in the boost areaPatient age, breast volume receiving P55 Gy (V55 CTV breast), and the maximum radiation dose in the breast (Dmax)Real worldRisk assessment modelMultivariate logistic regressionInternal validation using bootstrapping and sequential forward variable selection. No external validation.AUC: 0.66HL test non-significant P = 0.42Vos et al. (2015)367 (69 breast)N/ABCS + RT33 months (median)Cosmetic outcome assessed by panel, BRA, patientTumour/breast volumes ratio, tumour location, specimen weightReal worldBreast cosmesis prediction toolMultivariate linear regressionInternal validation using key statistics, No external validation.AUC: 0.83Manie et al. (2018)564Median age 47 yearsMastectomy + extended latissimus dorsi flap IBRN/ACosmetic outcome assessed by panel and patientN/AReal worldBreast cosmesis prediction toolN/ANo internal or external validation.N/AKindts et al. (2019)38121Median age 60 yearsBCS + WBI followed by boost to the tumour bed1 yearLate unfavourable aesthetic outcome—late radiotoxicityClinicopathological factors (seroma and axillary lymphadenectomy) and radiation dose-volume metrics (V55)Real worldRisk assessment (NTCP) modelMultivariable logistic regressionBootstrapping (10 000) for internal validation. No external validation.AUC 0.75HL test P value: non-significantMeshulam-Derazon et al. (2024)39136Average age 49.3 yearsBCS + RT1 yearPoor cosmetic/shape outcomeBMI, removed tissue volume, tumour locationReal worldRisk assessment modelLogistic regressionNo internal or external validation.N/ANaoum et al. (2022)401617N/ABreast reconstruction: autologous, TE/implant, direct-to-implant ± RT6.6 years (median)(i) Infection/necrosis requiring debridement, (ii) capsular contracture requiring capsulotomy, (iii) absolute and (iv) overall implant failureSmoking, DM, BMI, radiotherapy, total LNs sampled, total malignant LNs, reconstruction time, incision type, chemotherapy, mesh type, ethnicity, menopause statusReal world (Research Electronic Data Capture database)Risk assessment nomogramsFour multivariate logistic regression models used (one for each endpoint)Cross-validation (10-fold) for internal validation. No external validation.AUC: 68%-76%Bavaro et al. (2023)859Median age 47 yearsIBR (implant) + 3D CRT≥18 monthsCapsular contracturePgR, ER, lymph node status, histological grading, histological subtype, Ki67 expression, and molecular subtypeReal worldRisk assessment modelClassification algorithms: RF, XGBoost, SVM10 rounds of a 10-fold cross-validation for internal validation. No external validation.XGBoost, SVM, RF (respectively): AUC: 68%, 66%, 65%. Accuracy: 68%, 66%, 64%. Sensitivity: 64%, 64%, 82%. Specificity: 74%, 65%, 48%3D-CRT-SIB, three-dimensional conformal radiotherapy with a simultaneous integrated boost; BCS, breast-conserving surgery; BMI, body mass index; CTV, clinical target volume; DM, diabetes mellitus; ER, estrogen receptor; HL, Hosmer-Lemeshow; IBR, immediate breast reconstruction; JSE, James–Stein estimator; LN, lymph node; MLE, maximum likelihood estimator; N/A, not applicable; PgR, progesterone receptor; RF, random forest; RT, radiotherapy; SVM, support vector machine; WBI, whole breast irradiation.
Adverse breast cosmesis
Several prediction models have been developed for adverse breast cosmesis following BCS and radiotherapy. Vos et al. (2015)3 investigated the effect of tumour volume to breast volume, tumour location, and specimen weight on cosmetic outcome by panel assessment in 69 patients. Their model showed moderate performance with a C-index of 0.64. Kindts et al. (2019)38 developed and attempted to validate a model for unfavourable cosmetic outcome scored by BCCT.core software41 with the variables seroma, axillary lymph node dissection (ALND), and V55. AUC was 0.75 in the development cohort treated between 2009 and 2014 and 0.66 in the validation cohort of patients who were treated at the same centre in 2015-2016. The authors then modified their published model by including V45 and retaining seroma but not ALND, with the new model achieving an AUC of 0.75 in the validation cohort. Meshulam-Derazon et al. (2024)39 developed models for adverse cosmetic outcome and adverse breast shape after BCS as determined by panel assessment like in Vos et al. (2015)3 to identify patients who might benefit from oncoplastic intervention. The final published models incorporated BMI and various chest wall and breast measurements including tumour position within the breast, although they did not provide performance metrics and the models have not been externally validated (Table 5).
For adverse cosmesis following post-mastectomy breast reconstruction, Manie et al. (2018)5 developed a model for patients undergoing immediate reconstruction with an extended latissimus dorsi using BMI and breast cup size. They provided a simple nomogram and showed that BMI >33 kg/m^2^ was predictive of unfavourable cosmetic outcome regardless of breast cup size. Their study had a relatively small sample size of 64 patients and external validation has not been carried out (Table 5). Naoum et al. (2022)40 and Bavaro et al. (2023)8 developed prediction models for capsular contracture following implant-based breast reconstruction and radiotherapy. Naoum et al. (2022)40 used LASSO penalized regression to select predictors in a large dataset of 1619 patients who underwent reconstruction between 1997 and 2007 and presented a nomogram, whereas Bavaro et al. (2023)8 applied ML classification models to a small patient dataset (n = 59) with the extreme gradient boosting classifier achieving the highest AUC of 0.68. Neither of these models have been externally validated.
Multiscale prediction models of breast appearance
Three publications described the development of a multiscale finite element model (FEM) to predict breast appearance following BCS. The aim of these studies was to leverage computational modelling and ML to simulate breast healing and deformation over time. Garbey et al. (2013)42 introduced a two-dimensional simulation framework that integrates mechanical tissue deformation and biological healing models. They utilised cellular automata to model the healing process at the cellular level and FEM to simulate tissue deformation under gravity and other mechanical forces. Magnetic resonance imaging (MRI) was used to provide initial breast geometry, although patient-specific mechanical properties were not fully captured (Table 6). Vavourakis et al. (2016)43 expanded the modelling framework to three-dimensional simulations incorporating FEM and continuum mechanics to simulate tissue deformation, coupled with biological healing models. Their model simulated the healing process over several months, accounting for factors such as tissue stiffness, inflammation, and remodelling. Their computational framework integrates clinical and imaging data (e.g. MRI) from breast cancer patients to create patient-specific models that predict breast shape and appearance over time (Table 6).Table 6. Summary of studies evaluating multiscale prediction models of breast appearance following BCSStudyNo. of patientsAgeInterventionFollow-up timePredicted outcomeSelected predictorsData sourceType of model developedModel specification/algorithms usedModel validationModel performance/metricsGarbey et al. (2013)42N/AN/ABCSPOD 1 onwardsCosmesis outcomeMRI images, mechanical, biological, and molecular variablesReal world2D multiscale biomechanical model/FEM of (proof of concept)Multiscale ML modelling coupled with mechanical and biological modelsNo internal validation.N/AVavourakis et al. (2016)434N/ABCSPost-operative day 1 onwards up to 1 yearBreast tissue deformation and wound healingMRI images, mechanical, biological, and molecular variablesReal world3D multiscale biomechanical model/FEMMultiscale ML modelling coupled with mechanical and biological modelsFollow-up data, in the form of 3dMD surface scans, were acquired 6-12 months after surgery for each patient and compared directly with the predicted surgical outcome. No external validation.High accuracy: The mean surface distances between the simulation and the follow-up optical surface scan range between 2.8 and 4.1 mm. This indicates an excellent simulation accuracy.Zolfagharnasab et al. (2018)11N/AN/ABCS simulation1 yearBreast shape deformationMRI images, mechanical, biological, and molecular variablesIn-house semi-synthetic dataset3D multiscale biomechanical model/FEMRegression algorithms:Leave-one-patient-out (LOPO) cross-validation technique. No external validation.N/A2D, two-dimensional; BCS, breast-conserving surgery; FEM, finite element model; GBR, gradient boosting regression; ML, machine learning; MOR, multi-output regression; MRI, magnetic resonance imaging; N/A, not applicable; RF, random forest.
Zolfagharnasab et al. (2018)11 focused on overcoming the time and resource demands of biomechanical modelling (FEM) by introducing ML techniques into the modelling process. The authors used SVM and artificial neural networks to generate predictions of cosmetic outcomes based on various features in a semi-synthetic dataset derived from in-house breast MRI images to predict complex outcomes such as breast contour and symmetry (Table 6).
Discussion
The aim of this paper was to systematically review the literature on prediction models for adverse outcomes following breast cancer surgery and radiotherapy. It provides a qualitative synthesis of 33 studies which modelled a range of early and long-term adverse events, SSI, delayed wound healing, capsular contracture, radiation-induced skin toxicity, and breast deformity. The prediction tools ranged from traditional statistical models, such as LR, to biomechanical modelling and ML algorithms, which are increasingly being applied in clinical research. The most frequently reported models to date are the ACS NSQIP and the Breast Cancer Surgical Risk Calculators and the BRA score, all of which predict the risk of early complications following breast surgery and reconstruction. Despite the relatively large training datasets used in the development of these prediction models, they suffer from a relative lack of external validation, thus limiting their generalisability across different clinical settings and countries.
There is as yet no externally validated prediction model for long-term adverse outcomes following breast surgery or radiotherapy. The model developed by Kindts et al. (2019)38 failed to validate in a later cohort recruited at the same centre, although the authors modified and re-calibrated their clinico-dosimetric model and achieved moderate performance in the test cohort. Apart from Naoum et al. (2022),40 these studies used relatively modest training datasets with a maximum of a few hundred patients. Applied to predicting early radiation-induced skin toxicity as well as long-term capsular contracture and breast appearance, ML models have emerged with improved predictive performance compared with traditional statistical approaches. They also have the potential to incorporate a wider range and number of patient variables, such as imaging data and genomic markers,44 and may be applied to more complex endpoints. Published ML-based models have also been shown to outperform traditional statistical models by incorporating advanced data sampling techniques.6
The findings of this systematic review align with previous literature, which has identified the need for robust prediction models in breast cancer treatment to anticipate adverse outcomes and optimise patient care.12^,^45^,^46 While significant strides have been made in developing these models, the lack of external validation remains a challenge to clinical implementation. External validation is necessary for models to be used across diverse populations and settings,47 yet it was notably absent in most studies included in this review. Studies that included external validation, such as those evaluating the ACS NSQIP SRC and BRA score, revealed that many models require re-calibration for specific surgical techniques, such as microsurgical or pre-pectoral breast reconstruction. A sample size of ≥100, and ideally ≥200, is a crucial requirement for conducting external validation.48 Notably, while they frequently reported internal validation with split training–test datasets, none of the ML-based studies included in this review have extended validation to external cohorts, which may be due to lack of available multi-dimensional datasets.
For more complex endpoints concerning breast appearance, multiscale FEMs can account for the interaction between mechanical stress, gravity, and biological healing processes. However, these are computationally intensive, but in combination with ML used to extract key features from the models, they show promise in predicting post-operative appearance for use in the clinic. Of course, these emerging models will require external validation.
Study limitations
Although this systematic review has several strengths by including both parametric statistical and ML models covering the whole range of adverse outcomes following breast surgery and radiotherapy, there are some limitations. The search process may have missed relevant studies because this systematic review was limited to the available published data. This limitation was lessened by thoroughly screening >300 full-text records to ensure that only accurate prediction models were involved. The preponderance of models that lack external validation restricts their utility in clinical practice. Without external validation, the models’ performance in different patient populations and clinical settings remains uncertain.47 Furthermore, the heterogeneity of patient populations, treatment modalities, and outcome measures across the included studies pose challenges for risk of bias estimation and quantitative evidence synthesis. Many published models were developed in relatively small patient cohorts, which may limit their generalisability and reliability. Finally, the majority of published models focused on short-term outcomes, with fewer models addressing long-term outcomes such as late radiation toxicity and breast appearance.
Future research should prioritise the external validation of existing models across diverse clinical settings and patient populations, with modification to improve their performance and calibration if required. This step is essential for ensuring that prediction models can be reliably applied in routine clinical practice. Additionally, given the high survival rates in early breast cancer, long-term outcomes pose a major concern as they can significantly impact QoL. Therefore, there is a pressing need to develop prediction models that account for long-term outcomes, such as breast appearance, shape, and shrinkage (atrophy), the latter being an adverse outcome of radiotherapy. Furthermore, exploration of ML techniques, particularly those that incorporate imaging and genomic data, could enhance the predictive power of models and allow for more personalised risk assessments and better-tailored treatment plans for breast cancer patients.
Conclusion
This systematic review demonstrates that the majority of prediction models for adverse outcomes following breast cancer surgery and radiotherapy are not yet ready for widespread clinical implementation across diverse populations and clinical settings due to their lack of validation and immature technology development. It also highlights a relative lack of prediction models for long-term side-effects and more complex outcomes, such as cosmetic breast appearance and QoL, suggesting areas for future research.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Nardin S.Mora E.Varughese F.M.Breast cancer survivorship, quality of life, and late toxicities Front Oncol 1020208643261294710.3389/fonc.2020.00864 PMC 7308500 · doi ↗ · pubmed ↗
- 2Rubinger L.Gazendam A.Ekhtiari S.Bhandari M.Machine learning and artificial intelligence in research and healthcare Injury 54suppl 32023 S 69S 733513568510.1016/j.injury.2022.01.046 · doi ↗ · pubmed ↗
- 3Vos E.L.Koning A.H.J.Obdeijn I.M.Preoperative prediction of cosmetic results in breast conserving surgery J Surg Oncol 111220151781842533215810.1002/jso.23782 · doi ↗ · pubmed ↗
- 4Hammer C.Maduro J.H.Bantema-Joppe E.J.Radiation-induced fibrosis in the boost area after three-dimensional conformal radiotherapy with a simultaneous integrated boost technique for early-stage breast cancer: a multivariable prediction model Radiother Oncol 1221201745492779344410.1016/j.radonc.2016.10.006 · doi ↗ · pubmed ↗
- 5Manie T.Farahat A.Hashem T.Preoperative estimation of cosmetic outcomes after immediate breast reconstruction with extended latissimus dorsi flap: a simple prediction model JPRAS Open 15201810173215879210.1016/j.jpra.2017.09.005PMC 7061624 · doi ↗ · pubmed ↗
- 6Rattay T.Seibold P.Aguado-Barrera M.E.External validation of a predictive model for acute skin radiation toxicity in the REQUITE breast cohort Front Oncol 10202057590910.3389/fonc.2020.575909 PMC 766498433216838 · doi ↗ · pubmed ↗
- 7Cardoso J.S.Silva W.Cardoso M.J.Evolution, current challenges, and future possibilities in the objective assessment of aesthetic outcome of breast cancer locoregional treatment Breast 4920201231303179095810.1016/j.breast.2019.11.006PMC 7375658 · doi ↗ · pubmed ↗
- 8Bavaro D.A.Fanizzi A.Iacovelli S.A machine learning approach for predicting capsular contracture after postmastectomy radiotherapy in breast cancer patients Healthcare (Basel)117202310423704696910.3390/healthcare 11071042 PMC 10094026 · doi ↗ · pubmed ↗
