# An interactive atlas of genomic, proteomic, and metabolomic biomarkers promotes the potential of proteins to predict complex diseases

**Authors:** Mikael Benson, Martin Smelik, Xinxiu Li, Joseph Loscalzo, Oleg Sysoev, Firoj Mahmud, Dina Mansour Aly, Yelin Zhao

PMC · DOI: 10.21203/rs.3.rs-3921099/v1 · 2024-03-05

## TL;DR

This study shows that proteins are better than genetic variants or metabolites for predicting complex diseases, using data from 500,000 individuals.

## Contribution

The study introduces an interactive atlas and demonstrates that a small number of proteins can effectively predict complex diseases.

## Key findings

- Proteins were found to be more predictive than metabolites and genetic variants for complex diseases.
- Only five proteins per disease achieved strong predictive performance for disease incidence and prevalence.
- An interactive atlas was created to explore genomic, proteomic, and metabolomic biomarkers for complex diseases.

## Abstract

Multiomics analyses have identified multiple potential biomarkers of the incidence and prevalence of complex diseases. However, it is not known which type of biomarker is optimal for clinical purposes. Here, we make a systematic comparison of 90 million genetic variants, 1,453 proteins, and 325 metabolites from 500,000 individuals with complex diseases from the UK Biobank. A machine learning pipeline consisting of data cleaning, data imputation, feature selection, and model training using cross-validation and comparison of the results on holdout test sets showed that proteins were most predictive, followed by metabolites, and genetic variants. Only five proteins per disease resulted in median (min-max) areas under the receiver operating characteristic curves for incidence of 0.79 (0.65–0.86) and 0.84 (0.70–0.91) for prevalence. In summary, our work suggests the potential of predicting complex diseases based on a limited number of proteins. We provide an interactive atlas (macd.shinyapps.io/ShinyApp/) to find genomic, proteomic, or metabolomic biomarkers for different complex diseases.

## Full-text entities

- **Genes:** KLK4 (kallikrein related peptidase 4) [NCBI Gene 9622] {aka AI2A1, ARM1, EMSP, EMSP1, KLK-L1, PRSS17}, HIF1A (hypoxia inducible factor 1 subunit alpha) [NCBI Gene 3091] {aka HIF-1-alpha, HIF-1A, HIF-1alpha, HIF1, HIF1-ALPHA, MOP1}, HAVCR1 (hepatitis A virus cellular receptor 1) [NCBI Gene 26762] {aka CD365, HAVCR, HAVCR-1, KIM-1, KIM1, TIM}, WFDC2 (WAP four-disulfide core domain 2) [NCBI Gene 10406] {aka BENP, EDDM4, HE4, WAP5, dJ461P17.6}, GDF15 (growth differentiation factor 15) [NCBI Gene 9518] {aka GDF-15, HG, MIC-1, MIC1, NAG-1, PDF}, PLAUR (plasminogen activator, urokinase receptor) [NCBI Gene 5329] {aka CD87, U-PAR, UPAR, URKR}, MMP12 (matrix metallopeptidase 12) [NCBI Gene 4321] {aka HME, ME, MME, MMP-12}, TNFRSF10B (TNF receptor superfamily member 10b) [NCBI Gene 8795] {aka CD262, DR5, KILLER, KILLER/DR5, TRAIL-R2, TRAILR2}, CXCL17 (C-X-C motif chemokine ligand 17) [NCBI Gene 284340] {aka DMC, Dcip1, UNQ473, VCC-1, VCC1}
- **Diseases:** peripheral arterial occlusive disease (MESH:C564658), SLE (MESH:D008180), UC (MESH:D003093), T2D (MESH:D003924), hypoxic (MESH:D002534), RA (MESH:D001172), CD (MESH:D003424), stroke (MESH:D020521), obesity (MESH:D009765), inflammation (MESH:D007249), myocardial infarction (MESH:D009203), COPD (MESH:D029424), vascular dementia (MESH:D015140), immunological, cardiovascular or metabolic diseases (MESH:D002318), injury (MESH:D014947), PSO (MESH:D011565), ASVD (MESH:D050197), diabetes (MESH:D003920), Chronic diseases (MESH:D002908)
- **Chemicals:** lipoprotein lipids (-), NA (MESH:D012964), amino acids (MESH:D000596), fatty acid (MESH:D005227), ketone bodies (MESH:D007657)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10942575/full.md

---
Source: https://tomesphere.com/paper/PMC10942575