Multilevel predictors categorization for post-CABG atrial fibrillation prediction
Karina I Shakhgeldyan, Vladislav Y Rublev, Nikita S Kuksin, Boris I Geltser, Regina L Pak

TL;DR
This study develops predictive models for post-CABG atrial fibrillation using multilevel categorization to improve clinical interpretability.
Contribution
The novel multimetric categorization method enhances model explainability without sacrificing predictive accuracy.
Findings
The best XGB model with continuous predictors achieved an AUC of 0.76.
Multimetric categorization models showed comparable performance (AUC = 0.758) with better interpretability.
Nine PoAF predictors were identified and validated using multistage selection.
Abstract
Postoperative atrial fibrillation (PoAF) is a common complication after coronary artery bypass grafting (CABG). Despite its association with increased risk of ischemic stroke, bleeding, acute renal failure and mortality there is still no ideal predictive tool with proper clinical interpretability. A retrospective single-center cohort study enrolled 1305 electronic medical records of patients with elective isolated CABG. PoAF was identified in 280 (21.5%) patients. Prognostic models with continuous variables were developed utilizing multivariate logistic regression (MLR), random forest and eXtreme gradient boosting methods. Predictors were dichotomized via grid search for optimal cut-off points, centroid calculation, and Shapley additive explanation (SHAP). For multilevel categorization, we proposed to use threshold values combination identified during dichotomization, as well as ranking…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3| Predictor | Group 1 ( | Group 2 ( | OR (95%) CI |
|
|---|---|---|---|---|
|
| 66 (61; 71) | 63 (58; 69) | – | <.000001 |
|
| 76 (27.14%) | 237 (23.12%) | 1.24 [0.917; 1.673] | .188 |
|
| 170 (165; 176) | 170 (165; 176) | – | .802 |
|
| 80 (73; 90) | 80 (73; 90) | – | .86 |
|
| 27.9 (25.4; 31) | 28.1 (25.8; 31) | – | .27 |
|
| 60 (54; 64) | 60 (51; 65) | – | .89 |
|
| 1.01 (0.86; 1.18) | 0.98 (0.84; 1.16) | – | .263 |
|
| 0.408 (0.363; 0.455) | 0.417 (0.37; 0.458) | – | .442 |
|
| 3.3 (3.2; 3.7) | 3.35 (3; 3.8) | – | .185 |
|
| 5.1 (4.8; 5.425) | 5.1 (4.7; 5.4) | – | .284 |
|
| 7 (5; 9) | 6 (5; 8) | – | .051 |
|
| 25 (23; 30) | 25 (23; 30) | – | .221 |
|
| 3.9 (3.6; 4.2) | 3.9 (3.6; 4.3) | – | .339 |
|
| 4.6 (4; 5.1) | 4.3 (3.8; 4.9) | – | .0002 |
|
| 33.3 (24.5; 43.1) | 30.2 (23.7; 39.3) | – | .125 |
|
| 3.8 (3.5; 4.1) | 3.7 (3.4; 4) | – | .0023 |
|
| 4.5 (4.1; 4.9) | 4.3 (3.8; 4.8) | – | .0000033 |
|
| 100 (100; 100) | 100 (100; 100) | – | .69 |
|
| 160 (140; 180) | 150 (140; 180) | – | .025 |
|
| 80 (80; 100) | 80 (80; 100) | – | .00036 |
|
| 950 (895; 1090) | 900 (800; 1080) | – | .129 |
|
| 400 (360; 420) | 380 (360; 420) | – | .00143 |
|
| 92.71 (79; 110) | 97 (83; 110) | – | .036 |
|
| 158.8 (135.1; 188.7) | 162.8 (135.1; 190.9) | – | .43 |
|
| 43 (15.36%) | 124 (12.1%) | 1.56 [0.76; 1.47] | .169 |
|
| 96 (34.3%) | 343 (33.46%) | 1.04 [0.79; 1.38] | .85 |
|
| 225 (80.35%) | 813 (79.31%) | 1.7 [0.81; 3.7] | .81 |
|
| 6 (2.1%) | 20 (1.95%) | 1.24 [0.49; 3.15] | .74 |
|
| 52 (18.57%) | 112(10.93%) | 1.8 [1.26; 2.6] | .0009 |
|
| 89 (31.78%) | 296 (28.88%) | 1.13 [0.85; 1.51] | .38 |
|
| 20 (7.15%) | 67 (6.5%) | 1.12 [0.66; 1.88] | .82 |
|
| 75 (26.78%) | 262 (25.56%) | 1.07 [0.79; 1.43] | .74 |
|
| 45 (16.07%) | 137 (13.36%) | 1.2 [0.72; 1.95] | .29 |
|
| 68 (24.28%) | 257 (25.07%) | 0.95 [0.7; 1.3] | .82 |
|
| 27 (9.64%) | 89 (8.68%) | 1.2 [0.71; 1.76] | .70 |
|
| 142 (131; 152) | 142 (131; 153) | – | .79 |
|
| 4.69 (4.31; 5.03) | 4.69 (4.30; 5.06) | – | .97 |
|
| 6.8 (5.7; 8) | 6.9 (5.8; 8.37) | – | .16 |
|
| 1.87 (1.44; 2.42) | 2 (1.53; 2.53) | – | .09 |
|
| 228 (182; 266) | 232 (192; 273) | – | .0336 |
|
| 4.35 (3.66; 5.4) | 4.45 (3.7; 5.43) | – | .15 |
|
| 5.63 (5.11; 6.21) | 5.67 (5.13; 6.52) | – | .144 |
|
| 70.9 (66.3; 73.7) | 71.4 (67.9; 75.2) | – | .004 |
|
| 16.96 (12.1; 23.813) | 16.3 (11.7; 22.95) | – | .338 |
|
| 1.48 (1.16; 1.88) | 1.63 (1.2; 2.2) | – | .003 |
|
| 6 (5; 7.13) | 6 (4.97; 7.35) | – | .954 |
|
| 19.6 (16.6; 21.5) | 19.9 (17.1; 21.4) | – | .67 |
|
| 93.3 (86; 99.6) | 94 (86.88; 102) | – | .148 |
|
| 1.05 (1; 1.13) | 1.03 (0.98; 1.1) | – | .073 |
|
| 130 (130; 150) | 130 (125; 140) | – | .0229 |
|
| 80 (75; 80) | 80 (70; 80) | – | .00775 |
|
| 68 (62; 75) | 68 (62; 72) | – | .178 |
| Metrics | Cross-validation | Final testing | ||||
|---|---|---|---|---|---|---|
| MLR | XGB | RF | MLR | XGB | RF | |
|
| 0.696 [0.693; 0.699] | 0.762 [0.759; 0.765] | 0.755 [0.752; 0.758] | 0.697 [0.688; 0.706] | 0.76 [0.752; 0.768] | 0.752 [0.745; 0.76] |
|
| 0.655 [0.65; 0.66] | 0.693 [0.689; 0.698] | 0.684 [0.679; 0.69] | 0.655 [0.638; 0.673] | 0.693 [0.679; 0.708] | 0.683 [0.668; 0.698] |
|
| 0.64 [0.637; 0.642] | 0.691 [0.688; 0.694] | 0.701 [0.698; 0.704] | 0.643 [0.635; 0.651] | 0.686 [0.677; 0.694] | 0.697 [0.689; 0.706] |
|
| 0.306 [0.304; 0.309] | 0.354 [0.351; 0.357] | 0.359 [0.355; 0.362] | 0.308 [0.302; 0.316] | 0.35 [0.344; 0.357] | 0.356 [0.349; 0.363] |
|
| 0.886 [0.885; 0.888] | 0.905 [0.903; 0.906] | 0.903 [0.901; 0.905] | 0.885 [0.88; 0.89] | 0.464 [0.456; 0.472] | 0.901 [0.897; 0.905] |
|
| 0.415 [0.412; 0.418] | 0.466 [0.463; 0.469] | 0.468 [0.464; 0.472] | 0.419 [0.41; 0.428] | 0.464 [0.456; 0.472] | 0.467 [0.458; 0.475] |
| Predictors | Мin ( | Мax (AUC) | Centroid | SHAP |
|---|---|---|---|---|
|
| 60.0+ | 60.0+ | 64.0+ | 61+ |
|
| 3.0+ | 3.0+ | 3.33+ |
[3.1; 4.1] 5+ |
|
| 4.17+ | 4.17+ | 4.4+ | [4.2; 5.3] |
|
| 81- | 81- | 80- | 80- |
|
| 420+ | 382+ | 390+ | 390+ |
|
| 163+ | 163+ | 155+ | [170; 210] |
|
| 882+ | 882+ | 925+ |
[700; 750], [880; 1000], 1100 |
|
| 120+ | 100+ | 100+ | 130+ |
| Metrics | Мin ( | Мax (AUC) | Centroid | SHAP |
|---|---|---|---|---|
|
| 0.722 [0.715; 0.731] | 0.739 [0.732; 0.747] | 0.628 [0.618; 0.637] | 0.744 [0.736; 0.752] |
|
| 0.653 [0.637,0.669] | 0.683 [0.671; 0.695] | 0.601 [0.583; 0.618] | 0.698 [0.681; 0.714] |
|
| 0.661 [0.653; 0.669] | 0.696 [0.689; 0.703] | 0.6 [0.587; 0.613] | 0.644 [0.634; 0.653] |
|
| 0.319 [0.313; 0.326] | 0.354 [0.348; 0.362] | 0.269 [0.262; 0.276] | 0.323 [0.317; 0.329] |
|
| 0.887 [0.882; 0.892] | 0.901 [0.897; 0.904] | 0.861 [0.856; 0.866] | 0.898 [0.894; 0.903] |
|
| 0.428 [0.419; 0.437] | 0.466 [0.458; 0.473] | 0.369 [0.361; 0.379] | 0.441 [0.433; 0.448] |
| Predictors | SHAP | Multimetric categorization | Group medians and centroids | Quartiles | ||||
|---|---|---|---|---|---|---|---|---|
| Thresholds | WC | Thresholds | WC | Thresholds | WC | Thresholds | WC | |
|
| 61+ | 0.683 | 60+ | 0.729 |
[63; 66] 66+ |
0.397 0.317 |
[58; 64] [64; 69] 69+ |
0.72 0.874 0.679 |
|
|
[3.1; 4.1] 5+ |
0.958 1.154 |
[3.1; 4.1] 5+ |
0.915 0.956 |
[3.3; 3.33] 3.33+ |
0.674 0.18 |
[3.1; 3.3] [3.3; 3.8] 3.8+ |
1.422 1.061 0.886 |
|
| [4.2; 5.3] | 0.917 | [4.2; 5.3] | 1.161 | [4.4; 4.5] | 0.361 |
[3.8.4.3] [4.3; 4.8] |
0.19 0.307 |
|
| 80- | 0.255 | 80- | 0.253 | 80- | 0.381 | [80; 100] | 0.662 |
|
| 390+ | 0.542 | 390+ | 0.549 |
[380; 400] 400+ |
0.173 0.59 |
[395; 420] 420+ |
0.839 0.661 |
|
|
[700; 750] [880; 1100] 1100 |
1.145 0.911 0.740 |
[700; 750] [880; 1100] 1100 |
1.159 0.926 0.738 | 950+ | 0.219 |
[920; 1080] 1080+ |
0.331 0.091 |
|
| [170; 210] | 0.98 | [170; 210] | 0.987 | 155+ | 0.596 |
[160; 180] 180+ |
0.393 0.573 |
|
| 130+ | 1.335 |
[100; 130] 130+ |
0.625 1.67 | 100+ | 1.28 | 100+ | 1.41 |
|
| 1 | 0.734 | 1 | 0.726 | 1 | 0.68 | 1 | 0.76 |
| Metrics | Multilevel SHAP | Multimetric categorization | Group medians and centroids | Quartiles |
|---|---|---|---|---|
|
| 0.754 [0.747; 0.761] | 0.758 [0.751; 0.765] | 0.663 [0.654; 0.672] | 0.681 [0.672; 0.69] |
|
| 0.657 [0.642; 0.6717] | 0.665 [0.65; 0.679] | 0.612 [0.596; 0.627] | 0.635 [0.621; 0.65] |
|
| 0.688 [0.679; 0.697] | 0.691 [0.683; 0.699] | 0.6223 [0.615; 0.631] | 0.641 [0.634; 0.649] |
|
| 0.339 [0.333; 0.346] | 0.344 [0.338; 0.351] | 0.386 [0.378; 0.394] | 0.301 [0.295; 0.307] |
|
| 0.893 [0.889; 0.897] | 0.895 [0.891; 0.899] | 0.283 [0.277; 0.289] | 0.879 [0.875; 0.883] |
|
| 0.447 [0.439; 0.455] | 0.453 [0.445; 0.461] | 0.869 [0.864; 0.874] | 0.408 [0.4; 0.416] |
| Metrics | Continuous form of predictors | Categorical form of predictors | ||||
|---|---|---|---|---|---|---|
| MLR | XGB | RF | MLR | XGB | RF | |
|
| 0.673 [0.664; 0.682] | 0.787 [0.779, 0.795] | 0.766 [0.758, 0.774] | 0.773 [0.764; 0.781] | 0.775 [0.767, 0.783] | 0.757 [0.749, 0.765] |
|
| 0.56 [0.545; 0.575] | 0.698 [0.684; 0.712] | 0.765 [0.752; 0.778] | 0.709 [0.695; 0.723] | 0.715 [0.701; 0.729] | 0.715 [0.701,0.729] |
|
| 0.601 [0.593; 0.608] | 0.689 [0.681; 0.696] | 0.644 [0.637; 0.651] | 0.693 [0.686; 0.7] | 0.717 [0.71,0.724] | 0.704 [0.697; 0.711] |
|
| 0.264 [0.256; 0.272] | 0.385 [0.375; 0.394] | 0.37 [0.362; 0.378] | 0.391 [0.381; 0.4] | 0.414 [0.404; 0.424] | 0.4 [0.391; 0.41] |
|
| 0.85 [0.845,0.855] | 0.908 [0.903; 0.912] | 0.923 [0.919; 0.928] | 0.911 [0.906; 0.915] | 0.915 [0.911; 0.919] | 0.914 [0.91; 0.918] |
|
| 0.348 [0.339; 0.358] | 0.689 [0.681; 0.696] | 0.484 [0.475; 0.493] | 0.485 [0.475; 0.495] | 0.504 [0.494; 0.515] | 0.495 [0.485; 0.504] |
- —Ministry of Science and Higher Education of the Russian Federation10.13039/501100012190
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAtrial Fibrillation Management and Outcomes · Coronary Interventions and Diagnostics · Explainable Artificial Intelligence (XAI)
Introduction
Postoperative atrial fibrillation (PoAF) affects 20%–40% of patients after coronary artery bypass grafting (CABG) [1], with stable rates despite preventive strategies [2, 3] while some authors have even shown a potential trend to increase [4]. PoAF increases the risk of stroke, bleeding, and renal failure approximately fourfold, and doubles mortality at 30 days and 6 months [5]. The lack of a unified pathophysiological model has driven the creation of forecasting tools to personalize risk assessment [6–9].
Among PoAF prediction studies, the PoAF score [7] developed using MLR methods achieved an accuracy by an area under the ROC curve (AUC) of 0.63–0.65, with 0.6 sensitivity and 0.65 specificity values [8, 10]. Such limited accuracy prompted the use of new machine learning (ML) methods, allowing to improve model quality measured by AUC up to 0.7–0.75 [8, 11]. These models employed continuous and dichotomous predictors, with binary variables used to assess concomitant diseases. However, previous works lacked clinical justification for threshold values used in PoAF risk prediction. Multilevel categorization was only applied to age in some studies, with cut-off points set arbitrarily [7, 11].
This study aimed to develop new prognostic models of PoAF in patients with coronary artery disease after isolated CABG based on preoperative predictors set and their multilevel categorization efficiency evaluation to improve prognosis quality and its clinical interpretation.
Material and methods
Data
Single-center cohort retrospective study results are presented, during which electronic health records data of patients with coronary artery disease admitted for planned isolated CABG to the Vladivostok “Primorye Regional Clinical Hospital No. 1” cardiac surgery department from 2008 to 2023 were analyzed. Exclusion criteria included the presence of atrial fibrillation (of any form) in anamnesis, as well as combination of CABG with any other surgery. Thus, the final dataset was represented by 1305 patients (992 men and 313 women) aged 35 to 83 years. The study protocol met local institutional requirements and received full approval; patient consent was not required. Far eastern federal university review board “Ethics approval: IRB protocol number: №1/3,” were approved on 19/03/2023. As the study involved a retrospective review of medical records, the requirement for patient consent was waived. Data were accessed from 19/03/23 to 12/08/23 for research purposes. During this period authors had access to information that could identify individual participants during data collection (DOB and medical record number). To validate the results obtained in the present study, data from the University Clinic of the Far Eastern Federal University (Vladivostok) for the period 2022–2024 were used. The external validation dataset was constructed from electronic medical records in accordance with the inclusion and exclusion criteria of the primary dataset. The test dataset comprised medical records of 200 patients with coronary artery disease who underwent elective coronary artery bypass grafting (CABG).
First diagnosed PoAF episode was considered as the endpoint. AF episodes lasting more than 30 seconds, verified by the results of continuous electrocardiogram monitoring for at least 96 hours after CABG, were considered as PoAF development evidence. The PoAF presence was coded “1,” the absence – “0.” Thus, two patient groups were identified among the examined cohort. The first included 280 (21.5%) patients with AF paroxysms recorded during postoperative period in the hospital, the second—1025 (78.5%) patients without cardiac arrhythmias.
The preoperative clinical and functional status of patients was assessed on the first day of hospital treatment by 130 factors, the main ones of which are presented in Supplementary Appendix A. In addition to demographic, anthropometric, anamnestic data and physical examination results, clinical blood test indicators were analyzed. The diameters of the left (LAD) and right (RAD) atria, longitudinal dimensions of the left (LAL) and right (RAL) atrium, end-systolic (ESD) and diastolic (EDD) dimensions of the left ventricular (LV), ejection fraction (LVEF), and mean pulmonary artery pressure (MPAP) were determined. The ECG results were also analyzed: duration of P wave and QRS complex, PQ, QT intervals and RR.
Statistical methods
Continuous characteristics distribution according to the Kolmogorov-Smirnov test differed from normal, so consequently nonparametric mathematical statistics methods were used for them. The indicators were presented as median (Me) and interquartile ranges (Q1; Q3), the Mann-Whitney test was used for continuous variables intergroup comparisons, and χ2 for categorical ones. For binary variables, odds ratios (OR) and their 95% confidence interval (CI) were calculated by Fisher’s exact test. Differences were considered statistically significant at P-value < .05.
Machine learning
PoAF predictive models were developed using MLR, random forest (RF) and eXtreme gradient boosting (XGB) methods. Their quality was assessed by 6 metrics: AUC, sensitivity, specificity, F1-score, positive predictive value (PPV) and negative predictive value (NPV). For optimal hyperparameters selection, the Grid Search Cross-Validation (GridSearchCV) optimization method from sklearn Python library was used.
The dataset was split into 2 samples: for training and cross-validation (80%) and for final testing (20%). The training and cross-validation procedure was performed by stratified k-Fold technique on 10 folds. The average AUC quality metric was used for best model selection, predictors picking and validation, and optimal hyperparameters selection by searching through a grid of acceptable values (GridSearchCV). For final testing, the best MLR, RF and XGB models with optimal parameters and hyperparameters were trained on 80% of the dataset, and tested on the final testing sample (20%). For quality metrics confidence, the assessment procedure was repeated 500 times, followed by metrics averaging, performing the initial division randomly using the bootstrapping method (Fig. 1). Models were developed by utilizing open-source libraries Python version 3.9.16 (scikit-learn version 0.24.2, xgboost version 1.5.1).
Study design
Variables categorization
This study utilized a multilevel categorization method that was previously reported by the authors [12].
To dichotomize potential predictors, we used grid-step optimization methods Δ=(max-min)/100: P-value minimization—Min(P-value), AUC maximization—Max(AUC), quartile method [13], centroid method and SHapley Additive exPlanation (SHAP) [14]. The Shapley method allowed us to identify thresholds at which the predictor influence function on the endpoint demonstrated singularity, which can be observed several times during the range of changes in the continuous attribute values [12]. To carry out multi-level categorization, we combined all threshold values identified with indicators dichotomization utilizing various methods, including the SHAP method. In this case, close threshold values were combined into one by averaging. The centroid method assumed usage of the analyzed characteristics median in the comparison groups (with and without PoAF) and values equidistant from them (centroids), with the help of which 4 categories were identified for each indicator [14]. The quartile method involves identifying 4 categories for each variable based on the results of assessing their medians, 2nd and 3rd quartiles [15].
For indicators endpoint influence degree assessment, the SHapley Additive exPlanation method was used.
Study design
The study design included 6 stages (Fig. 1). At the very first of them utilizing intergroup comparisons tests, a potential PoAF predictors pool was formed. At the second stage of the study, PoAF prognostic models with predictors in a continuous form were developed by ML methods. The prognostic significance of the predictor was confirmed by AUC value increase after its inclusion in the model. During models development, all variables were considered, regardless of statistically significant differences in comparison groups, and hyperparameters were adjusted at the same stage. Models development and cross-validation was carried out on 80% of the dataset (derivation cohort), and the final testing was carried out on 20% (validation cohort). For further steps, the predictors and hyperparameters obtained at this stage were utilized. At the third stage, using various threshold values identification methods, binarization of continuous variables was carried out using a derivation cohort, and on their basis, PoAF prognostic models were developed, which were validated on the validation cohort. At the fourth stage of the study, multi-level categorization of variables was carried out using 4 approaches. In the first of them, only thresholds identified by the SHAP method were taken into account; in the second, the set of threshold values obtained by other dichotomization methods was expanded. In addition, thresholds obtained by the centroid method were considered, taking into account the medians of the groups with and without PoAF, as well as using quartiles Q1, Q2 and Q3. For risk factors endpoint influence degree assessment, MLR models were developed, whose weights were used to code multilevel categorical predictors. Risk factors with negative or close to 0 weight coefficients in the MLR model were excluded from consideration. At the fifth stage of the study, 4 new PoAF prognostic models were developed by XGB method, the predictors of which were obtained by different methods of multilevel categorization. To assess statistically significant differences between the performance metrics obtained using bootstrapping (n = 100), 95% CI and pairwise comparisons using the Mann–Whitney U test were applied. At the final stage, external validation of the developed models was performed. The 95% confidence intervals for the selected performance metrics were estimated using the bootstrap method by randomly generating 2000 resampled datasets, each containing N − 10 observations, where N denotes the number of records in the external dataset.
Results
Subject characteristics
Intergroup analysis of clinical, demographic and laboratory parameters demonstrated that patients with PoAF were distinguished by older age, an increased prevalence of tricuspid regurgitation (TR) among them, lower levels of platelets, total protein and triglycerides in the blood. Individuals in this group had higher values of LV ESD, LAD, RAD and RAL, Ao/LV systolic pressure gradient and an increased duration of the QT and PQ intervals (Table 1 and Supplementary Appendix A).
Machine learning models
During the second stage, PoAF prognostic models were developed, validated and tested utilizing RF, XGB and MLR methods. For all models, the best AUC metric results were obtained by usage of ECG indicators (duration of QRS, QT, PQ, RR, and P wave intervals), age, RAD, ESD, and TR as predictors. Developed models predictive value comparison showed that the XGB and RF methods provide higher forecast accuracy compared with MLR (AUC—0.76 and 0.752 vs 0.697) (Table 2). To assess potential overfitting, model performance was additionally evaluated on the training dataset (Supplementary Appendix B). The absolute difference between performance metrics obtained in the training and testing sets did not exceed 0.05, indicating adequate model generalizability and the absence of clinically relevant overfitting. Supplementary Appendix B shows MLR model weight coefficients.
Categorization
During third stage, PoAF predictors in a continuous form were dichotomized utilizing searching for the optimal cutoff threshold on the grid methods [Min(P-value) and Max(AUC)], along with the SHAP and centroid calculation method (Table 3). Threshold values usage, deviation from which is associated with PoAF likelihood increase, allows us to consider binarized data as risk factors for adverse events. The risk factor is coded “1” if the predictor value exceeds the threshold with the postfix “+” or does not reach it—with the postfix “−,” in other cases—“0.”
Study results showed substantial variation in threshold values across binarization methods. For example, the cutoff point for QRS according to SHAP was 80 ms, while when maximizing AUC, the cutoff point was fixed at 81 ms, and for the P wave, values above 130 ms were risk factors, while for Max(AUC)—above 100 ms (Table 3). The first three dichotomization methods considered isolated indicators and did not take into account predictive models. The SHAP method was applied to a multifactorial XGB model and the threshold value were defined as the point where the shap-value exceeded the level of 0.2 arbitrary units Thus, due to dichotomization, the following PoAF risk factors were identified: age over 61 years, RAD more than 4.2 cm, ESD—3.1 cm, QRS duration less than 80 ms, QT more than 390 ms, P—130 ms, PQ—170 ms, RR—700 ms (Fig. 2).
Continuous indicators and their threshold values endpoint impact assessment by the SHAP method. The blue and red dotted lines indicate the cutoff thresholds. ESD = LV end systolic dimension, LV = left ventricle, RAD = right atrium transverse size
Annotation: The blue and red dotted lines indicate the cutoff thresholds. Abbreviations: LV—left ventricle, ESD—LV end systolic dimension, RAD—right atrium transverse size.
Using the QRS diagram as an example (Fig. 2), it can be seen that the probability of PoAF developing in the range from 40 to 80 ms remains consistently high, but sharply decreases at its values ≥ 90 ms. Exceeding the QT parameter value more than 390 ms increases the risk of arrhythmia, but its maximum probability is fixed when the QT value is above 450 ms. Assessing the dynamics of changes in shap-value allows us to explain the relationship between various predictor values and study endpoint, which was the basis for utilizing this method in multilevel categorization procedures. For example, the distribution of the PQ interval demonstrates that the highest risk of POAF is observed among patients whose PQ values fall within the range of 170–210 ms.
On the basis of MLR POAF prognostic models with dichotomous predictors were developed (Table 4). It was found that the model constructed using the centroid method was significantly inferior to the models developed using the Max(AUC) and SHAP-based approaches, which demonstrated acceptable predictive performance (AUC: 0.739–0.744).
At the fourth stage of the study, utilizing various multilevel categorization methods, 4 groups of PoAF risk factors were formed. The first pool of risk factors was obtained from the shap-value analysis results (Fig. 2). The second pool expanded the first due to threshold values obtained at the third stage of the study by several dichotomization methods. The third group of risk factors was represented by the predictors medians in the comparison groups and their centroids, and the fourth group used threshold values corresponding to predictors quartiles. To encode multilevel categorical predictors values, we used the weight coefficients (WC) of the MLR models developed for each risk factors group (Table 5). We call the approach that ensures the formation of a second pool of risk factors and their WC—multimetric categorization.
Models based on multilevel categorical predictors
At the fifth stage of the study, based on multilevel predictors obtained by various methods, 4 PoAF prognostic models were developed by XGB (Table 6). The best predictive properties were demonstrated by the model with predictors identified by the multilevel categorization method. The latter had comparable accuracy with the models including continuous variables or risk factors (AUC 0.758 and 0.76). In addition, this model enabled explanation of POAF risk based on the assessment of threshold values and the relative contributions of its predictors (Table 5). Taking these findings into account, the highest probability of POAF in patients with CAD after CABG was associated with a P-wave duration ≥ 130 ms (WC = 1.67), RAD of 4.2–5.3 cm (WC = 1.161), RR interval of 700–750 ms (WC = 1.159), PQ interval of 170–210 ms (WC = 0.987), and ESD ≥ 5 cm (WC = 0.956). An association with POAF was also identified for ESD in the range of 3.1–4.1 cm, QT interval ≥ 390 ms, QRS duration ≤ 80 ms, RR intervals of 880–1100 ms and ≥ 1100 ms, P-wave duration of 100–130 ms, age > 60 years, and the presence of CHF and TR.
A comparative accuracy assessment between prognostic models with predictors identified by dichotomization and by multimetric multilevel categorization methods demonstrated the advantages of the latter, which was confirmed by statistically significant AUC metric differences (P-value < .001).
The assessment of individual PoAF predictors’ influence on its development was performed utilizing the SHAP and XGB methods (Fig. 3). The strongest influence is demonstrated by the QT indicator (shap-value 0.94) after 450 ms, TR presence, the duration of the RR intervals (700–1100 ms) and RAD (4.2–5.3 cm). A low PoAF development probability is associated with the younger age (patients under 60 years), ESD below 3 cm and RAD up to 4.1 cm, PQ interval less than 150 ms and QRS above 100 ms.
Assessment of predictor impact on POAF using the SHAP method
At the final stage of the study, the predictive performance of POAF models with continuous predictors and those with predictors derived using the multimetrical categorization approach was evaluated on an external cohort of patients treated at the University Clinic of the Far Eastern Federal University (Table 7). The validation results demonstrated comparable predictive accuracy of the MLR, SGB, and SL models with continuous predictors (Table 2) and the SGB models with categorical predictors (Table 6). The similar predictive performance observed during both model development and external validation may be explained by the use of contemporary modeling approaches as well as by the set of selected predictors (patient age, ECG and echocardiographic parameters), which are largely independent of the professional, organizational, and resource-related characteristics of healthcare institutions.
Conclusion
In recent years, prognostic models utilizing ML methods are being developed, wide usage of which in clinical practice is limited by the complexity of prognostic results interpretation. Promising tools for this problem solving are explainable artificial intelligence (XAI) algorithms, the elements of which includes the predictors threshold values determination and their ranking according to their influence intensity on the endpoint. Predictors threshold values determination is carried out using their categorization, which allows to detail the relationship between indicators of the clinical and functional patients status with the resulting variable. According to the literature, the most accessible method of multilevel categorization is descriptive statistics with the medians, quartiles or quantile calculation [16]. However, most of the categorization criticisms are associated precisely with this approach, which is primarily due to the dependence of such threshold values on a specific sample, lack of relationship with the clinical context, ignoring possible non-linear relationships, etc [16]. An alternative method that takes into account the clinical context is searching for optimal threshold values based on minimization or maximization of objective functions, such as Min(P-value) or Max(AUC).
Recent literature about PoAF prediction problem analysis showed that utilizing categorization methods, PoAF risk factors were identified, which included the age of patients over 60 years [7, 17], 66 years [5, 18] or 70 years [19], increased LA size [5, 20], including with LAD > 4.5 cm [18] or > 3.9 cm [11] and reduced LVEF < 30% [7], increased P wave duration according to standard (>116 ms) [21, 22]. A number of anamnestic data are also validated as risk factors for PoAF: male gender [5, 23], the presence of arterial hypertension, chronic heart failure, chronic obstructive pulmonary disease, chronic kidney disease, diabetes mellitus [5], rheumatic heart disease [18, 23], mitral valve disease [3], previous cardiac surgery, metabolic syndrome and obesity [5, 6]. In our study, anamnestic features did not demonstrate predictive potential. The TR and RAD indicators were firstly verified as PoAF predictors. Our study confirmed the predictive value of age and ECG in relation to PoAF, but did not reveal a relationship between laboratory data and LVEF with the PoAF development.
Utilizing the patients with coronary artery disease after the CABG database example, we analyzed the effectiveness of various predictors threshold values searching methods, deviations from which increased their predictive potential and allowed to attribute them as PoAF risk factors. It was identified that the SHAP method, which considered as one of the promising XAI technologies, is a useful categorization tool due to the effective determination of cut-off thresholds, in particular for multilevel categorization and predictors relationship analysis both in continuous and categorical forms with the study endpoint. At the same time, multilevel categorical predictors obtained by combining SHAP data with other dichotomization methods results have been shown to provide higher predictive accuracy. Potential risks of information loss during new categorization methods usage were overcome by detailing knowledge about the interconnection of individual risk factors with the study endpoint. This was confirmed by predictive models quality criteria comparison for predictors both in continuous and multilevel categorical forms. Thus, for the best model with continuous predictors, AUC was 0.76, while utilizing multimetric categorization, it was 0.758. Categorization methods based on descriptive statistics did not compensate for the loss of predictive accuracy through the introduction of additional information.
It should also be noted that our models demonstrated higher predictive accuracy than those reported in a previously published study that relied exclusively on preoperative variables and employed MLR, RF, and XGB methods (best AUC in the present study: 0.76 vs. 0.74 in [8]). XGB-based models yielded the best overall performance metrics. Marked differences were observed between predictor importance estimates obtained using SHAP and those based on regression coefficients in the MLR models. Specifically, in our study, the greatest importance for achieving the endpoint according to MLR coefficients was associated with a P-wave duration exceeding 130 ms, whereas the highest risk according to the Shapley method was linked to QT > 450 ms. These findings indicate the need for further research aimed at quantifying the magnitude of predictor effects on the clinical endpoint, which is of critical importance for the clinical interpretation of predictive outcomes.
In the present study, machine learning methods (MLR, XGB, and RF) and two approaches to the multilevel categorization of POAF predictors—multimetrical categorization and the SHAP-based method—were evaluated using a database of patients with coronary artery disease after isolated coronary artery bypass grafting. Using the developed prognostic POAF models, it was demonstrated that the proposed categorization procedures provide both high predictive accuracy and enhanced transparency of risk prediction results.
Clinical significance
The application of prognostic models developed using preoperative predictors and modern machine learning methods enables accurate estimation of the risk of postoperative atrial fibrillation in patients with coronary artery disease undergoing coronary artery bypass grafting. Multilevel categorization of predictors enhances the transparency of risk prediction and serves as a valuable tool for justifying the selection of individualized preventive strategies.
Dataset limitations
The limitations of this study are primarily related to its retrospective design and the data collection mainly from a single medical center. External validation was performed on smaller sample and only from one healthcare institute. A second limitation is that most of the analyzed variables (with the exception of sex and age) contained missing values, with the proportion of missing data ranging from 0.08% to 25%. The missingness was assumed to be random, making a systematic bias unlikely. Finally, several clinical and functional characteristics that have previously been validated as predictors of POAF in other studies were not included in the present analysis. We plan to digitize and incorporate these variables in future research. The full source code for the software implementation of the method is
Limitations imposed by model overfitting
Increasing the complexity of machine-learning models by incorporating numerous predictors or by deepening tree structures in ensemble algorithms is well known to increase the risk of overfitting [24]. In the present study, unexpectedly high predictive performance was initially observed when only ECG-derived variables (PQ, QRS, RR, QT intervals and P-wave duration) were used in ensemble learning methods (XGB and RF), whereas the MLR model based on the same predictors demonstrated poor discriminative ability (AUC = 0.58).
Further analysis revealed pronounced overfitting of the SGB model despite the limited number of predictors: the AUC reached 0.999 on the training dataset but dropped to 0.78 on the test dataset. Moreover, any modification of the predictor structure, including categorical transformation, resulted in a substantial deterioration of model performance (AUC = 0.60), confirming model instability under these conditions.
To mitigate overfitting, additional clinically relevant predictors (RAD, age and ESD) were incorporated. This adjustment ensured both high predictive performance on the test dataset (AUC = 0.76) and stable generalization, as evidenced by near-identical AUC values for the training and test sets (0.762 and 0.76, respectively). Importantly, predictor dichotomization did not lead to a significant loss of discriminative ability, and the predictive accuracy of the MLR model increased markedly compared with the ECG-only model (AUC 0.697 vs. 0.58).
The full source code for the software implementation of the method is available at the following link: https://github.com/NikitaKuksin/MultilevelPredictorsCategorizationForAtrialFibrillationPredictionInPatientsWithCoronaryHeartDisease.git
Supplementary Material
bpaf092_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Gaudino M , Di Franco A, Rong LQ et al Postoperative atrial fibrillation: from mechanisms to treatment. Eur Heart J 2023;44:1020–39.36721960 10.1093/eurheartj/ehad 019PMC 10226752 · doi ↗ · pubmed ↗
- 2Xu Z , Qian L, Zhang L et al Predictive value of NT-pro BNP, procalcitonin and CVP in patients with new-onset postoperative atrial fibrillation after cardiac surgery. Am J Transl Res 2022;14:3481–7.35702124 PMC 9185093 · pubmed ↗
- 3Shen J , Lall S, Zheng V et al The persistent problem of new-onset postoperative atrial fibrillation: a single-institution experience over two decades. J Thorac Cardiovasc Surg 2011;141:559–70.20434173 10.1016/j.jtcvs.2010.03.011PMC 2917532 · doi ↗ · pubmed ↗
- 4Filardo G , Damiano RJ, Ailawadi G et al Epidemiology of new-onset atrial fibrillation following coronary artery bypass graft surgery. Heart 2018;104:985–92.29326112 10.1136/heartjnl-2017-312150 · doi ↗ · pubmed ↗
- 5Greenberg JW , Lancaster TS, Schuessler RB et al. Postoperative atrial fibrillation following cardiac surgery: a persistent complication. Eur J Cardiothorac Surg 2017;52:665–72. 10.1093/ejcts/ezx 03928369234 · doi ↗ · pubmed ↗
- 6Lu Y , Chen Q, Zhang H et al Machine learning models of postoperative atrial fibrillation prediction after cardiac surgery. J Cardiothorac Vasc Anesth 2023;37:360–6.36535840 10.1053/j.jvca.2022.11.025 · doi ↗ · pubmed ↗
- 7Mariscalco G , Biancari F, Zanobini M et al Bedside tool for predicting the risk of postoperative atrial fibrillation after cardiac surgery: the Po AF score. J Am Heart Assoc 2014;3:e 000752.24663335 10.1161/JAHA.113.000752 PMC 4187480 · doi ↗ · pubmed ↗
- 8Karri R , Kawai A, Thong YJ et al Machine learning outperforms existing clinical scoring tools in the prediction of postoperative atrial fibrillation during intensive care unit admission after cardiac surgery. Heart Lung Circ 2021;30:1929–37.34215511 10.1016/j.hlc.2021.05.101 · doi ↗ · pubmed ↗
