# Integrating Host Genetics and Clinical Setting in Machine Learning Models: Predicting COVID-19 Prognosis for Healthcare Decision-Making (The FeMiNa Study)

**Authors:** Elisabetta D’Aversa, Bianca Antonica, Miriana Grisafi, Rosanna Asselta, Elvezia Maria Paraboschi, Angelina Passaro, Stefano Volpato, Francesca Remelli, Massimiliano Castellazzi, Alberto Maria Marra, Antonio Cittadini, Roberta D’Assante, Francesca Salvatori, Ajay Vikram Singh, Salvatore Pernagallo, Veronica Tisato, Donato Gemmati

PMC · DOI: 10.3390/diagnostics16040583 · Diagnostics · 2026-02-15

## TL;DR

This study shows that combining genetic and clinical data in machine learning models improves the prediction of severe outcomes in hospitalized COVID-19 patients.

## Contribution

The novel integration of host genetics with clinical features in ML models enhances mortality prediction for personalized healthcare decisions.

## Key findings

- XGBoost f1 optimization reduced misdiagnosed patients compared to other metrics.
- Genetic markers like HLA-DRA rs3135363 and PPARGC1A rs192678 improved model performance alongside clinical features.
- Age and ventilation were the top predictors of mortality, with genetic data enhancing model accuracy.

## Abstract

What are the main findings?
eXtreme Gradient Boosting f1 optimization avoids losing several patients due to misdiagnosis.Genetic data implement the model’s power for mortality prediction.

eXtreme Gradient Boosting f1 optimization avoids losing several patients due to misdiagnosis.

Genetic data implement the model’s power for mortality prediction.

What are the implications of the main findings?
Integrating genetics in ML enables a more personalized medical approach.

Integrating genetics in ML enables a more personalized medical approach.

Background/Objectives: COVID-19 has made a tremendous impact, causing a massive number of deaths worldwide. The inadequacy of health facilities resulted in shortage of resources and exhaustion of frontline workers who had to manage in a short time many patients with no tools to prioritize those at high risk. This study intended to disclose the architecture of such complex disease and enhance the management of hospitalized patients, preventing severe outcomes. Methods: We performed a retrospective multicenter study aimed at refining the best predictive model for COVID-19 mortality, integrating 19 genetic and 13 clinical features. We trained three machine learning (ML) models (GBM, XGB and RF) on a dataset of 532 COVID-19 hospitalized Italian patients, among the 605 recruited during the first wave of the pandemic, when vaccines were not available. Results: All the models achieved great values for accuracy, AUROC, f1, f2 and PR-AUC metrics. XGB f1 optimization resulted in better performance providing fewer false positives (Nf1 = 26 versus Nf2 = 27, NPR-AUC = 29), and mostly false negatives (Nf1 = 63 versus Nf2 = 69, NPR-AUC = 69), being the main goal to answer. We next delved into the feature importance to understand which features contribute to the model decision: age was the main driver of mortality prediction, followed by ventilation. The remainder was equally distributed between genetic (HLA-DRA rs3135363, PPARGC1A rs192678, CRP rs2808635, ABO rs657152) and other clinical features, demonstrating that genetic data did not confound, but rather implemented, the power of the model. Conclusions: Our results suggest that integrating genetic and clinical data into ML models is crucial for identifying high-risk cases within the vast disease heterogeneity, enabling the P4-medicine approach to improve patient outcomes and support the healthcare system.

## Linked entities

- **Genes:** HLA-DRA (major histocompatibility complex, class II, DR alpha) [NCBI Gene 3122], PPARGC1A (PPARG coactivator 1 alpha) [NCBI Gene 10891], CRP (C-reactive protein) [NCBI Gene 1401], ABO (ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase) [NCBI Gene 28]
- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Genes:** PPARGC1A (PPARG coactivator 1 alpha) [NCBI Gene 10891] {aka LEM6, PGC-1(alpha), PGC-1alpha, PGC-1v, PGC1, PGC1A}, OAS3 (2'-5'-oligoadenylate synthetase 3) [NCBI Gene 4940] {aka p100, p100OAS}, PCSK5 (proprotein convertase subtilisin/kexin type 5) [NCBI Gene 5125] {aka PC5, PC6, PC6A, SPC6}, CFH (complement factor H) [NCBI Gene 3075] {aka AHUS1, AMBP1, ARMD4, ARMS1, CFHL3, FH}, APOE (apolipoprotein E) [NCBI Gene 348] {aka AD2, APO-E, ApoE4, LDLCQ5, LPG}, CFTR (CF transmembrane conductance regulator) [NCBI Gene 1080] {aka ABC35, ABCC7, CF, CFTR/MRP, MRP7, TNR-CFTR}, ACE2 (angiotensin converting enzyme 2) [NCBI Gene 59272] {aka ACEH}, REN (renin) [NCBI Gene 5972] {aka ADTKD4, HNFJ2, RTD}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, KRT6B (keratin 6B) [NCBI Gene 3854] {aka CK-6B, CK6B, K6B, KRTL1, PC2, PC4}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}, LZTFL1 (leucine zipper transcription factor like 1) [NCBI Gene 54585] {aka BBS17}, ABO (ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase) [NCBI Gene 28] {aka A3GALNT, A3GALT1, GTA, GTB, NAGAT}, HLA-DRA (major histocompatibility complex, class II, DR alpha) [NCBI Gene 3122] {aka HLA-DRA1}, PGR (progesterone receptor) [NCBI Gene 5241] {aka NR3C3, PR}, TPD52 (tumor protein D52) [NCBI Gene 7163] {aka D52, N8L, PC-1, PrLZ, hD52}, CBX8 (chromobox 8) [NCBI Gene 57332] {aka PC3, RC1}, HLA-C (major histocompatibility complex, class I, C) [NCBI Gene 3107] {aka D6S204, HLA-JY3, HLAC, HLC-C, MHC, PSORS1}, UCP1 (uncoupling protein 1) [NCBI Gene 7350] {aka SLC25A7, UCP}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}, HLA-A (major histocompatibility complex, class I, A) [NCBI Gene 3105] {aka HLAA}, PCSK9 (proprotein convertase subtilisin/kexin type 9) [NCBI Gene 255738] {aka FH3, FHCL3, HCHOLA3, LDLCQ1, NARC-1, NARC1}, HLA-DPB1 (major histocompatibility complex, class II, DP beta 1) [NCBI Gene 3115] {aka DPB1, HLA-DP, HLA-DP1B, HLA-DPB}, PC (pyruvate carboxylase) [NCBI Gene 5091] {aka PCB}, IL6 (interleukin 6) [NCBI Gene 3569] {aka BSF-2, BSF2, CDF, HGF, HSF, IFN-beta-2}, CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, CCR2 (C-C motif chemokine receptor 2) [NCBI Gene 729230] {aka CC-CKR-2, CCR-2, CCR2A, CCR2B, CD192, CKR2}, PCSK7 (proprotein convertase subtilisin/kexin type 7) [NCBI Gene 9159] {aka LPC, PC7, PC8, SPC7}
- **Diseases:** hypercoagulability (MESH:D019851), ML (MESH:D007859), COPD (MESH:D029424), multi-organ injury (MESH:D009102), respiratory compromise (MESH:D012131), arteriopathy (MESH:D020212), ventilation (MESH:D053717), MVD (MESH:D008379), respiratory disease (MESH:D012140), inflammation (MESH:D007249), Disease-X (MESH:D004194), injury to (MESH:D014947), lung injury (MESH:D055370), hepatitis B (MESH:D006509), neoplasm (MESH:D009369), diabetes (MESH:D003920), endothelial dysfunction (MESH:D014652), RSV (MESH:D018357), dementia (MESH:D003704), heart failure (MESH:D006333), bacterial (MESH:D001424), infectious disease (MESH:D003141), long COVID (MESH:D000094024), viral infections (MESH:D014777), death (MESH:D003643), Hypertension (MESH:D006973), thrombotic (MESH:D013927), GBM (MESH:D000141), hepatopathy (MESH:D020754), ischemic stroke (MESH:D002544), COVID (MESH:D000086382), infection (MESH:D007239), cardiovascular and pulmonary disease (MESH:D002318)
- **Chemicals:** H2O (MESH:D014867), oxygen (MESH:D010100), aldosterone (MESH:D000450), EDTA (MESH:D004492), glucose (MESH:D005947), FC (MESH:C095424)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Homo sapiens (human, species) [taxon 9606]
- **Mutations:** rs113993960, rs8192678, rs113993959, C>A, rs2808635, T>C, rs34041956, rs77010898, Rs9277355, rs9277356, rs2499, rs74597325, Rs1042169, rs7412, rs1801178, rs192678, rs1061170, Rs2227306, G>C, T>G, rs80034486, rs74767530, rs3135363, rs1042522, rs429358, rs76713772, rs657152, rs1800592, rs35044562, rs10735079, rs121908799, Rs1800795, rs876538, rs2228145, rs2285666

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12939118/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12939118/full.md

## References

103 references — full list in the complete paper: https://tomesphere.com/paper/PMC12939118/full.md

---
Source: https://tomesphere.com/paper/PMC12939118