# Multi-omics and machine learning identify novel biomarkers and therapeutic targets of COVID-19

**Authors:** Yumei Zhou, Pengbei Fan, Haiyun Zhang, Shuai Han, Minghua Bai, Ji Wang, Qi Wang

PMC · DOI: 10.3389/fimmu.2025.1671936 · Frontiers in Immunology · 2025-10-02

## TL;DR

This study uses multi-omics and machine learning to identify new biomarkers and potential treatments for COVID-19, focusing on immune responses and CD8+ T cells.

## Contribution

The study introduces novel biomarkers (BTD, CFL1, PIGR, SERPINA3) linked to CD8+ T cell responses in COVID-19, supported by both transcriptomic and proteomic evidence.

## Key findings

- BTD, CFL1, PIGR, and SERPINA3 are strongly associated with CD8+ T cell abundance in COVID-19 patients.
- These genes effectively distinguish between COVID-19 patients and healthy individuals based on ROC curve analysis.
- Molecular docking analysis suggests these biomarkers may serve as therapeutic targets.

## Abstract

COVID-19 has caused over 7 million deaths worldwide since its onset in 2019, and the virus remains a significant health threat. Identifying sensitive and specific biomarkers, along with elucidating immune-mediated mechanisms, is essential for improving the diagnosis, treatment, and prevention of COVID-19. To predict key molecular markers of COVID-19 using an established multi-omics framework combined with machine learning models.

We conducted an integrated analysis of single-cell RNA sequencing (scRNA-seq), bulk RNA sequencing, and proteomics data to identify critical biomarkers associated with COVID-19. The multi-omics approach enabled the characterization of gene expression dynamics and alterations in immune cell subsets in COVID-19 patients. Machine learning techniques and molecular docking analyses were employed to identify biomarkers and therapeutic targets within the disease’s pathophysiological network.

Principal component analysis effectively grouped samples based on clinical characteristics. Using random forest and SVM-RFE models, we identified clinical indicators capable of accurately distinguishing COVID-19 patients. Transcriptomic analysis, including scRNA-seq, highlighted the pivotal role of CD8+ T cells, and WGCNA identified related module genes. Proteomic analysis, integrated with machine learning, revealed 36 DEPs. Further investigation identified several genes associated with monocyte proportions. Correlation analysis showed that BTD, CFL1, PIGR, and SERPINA3 were strongly linked to CD8+ T cell abundance in COVID-19 patients. ROC curve analysis demonstrated that these genes could effectively distinguish between COVID-19 patients and healthy individuals. Concordant findings from both transcriptomic and proteomic levels support BTD, CFL1, PIGR, and SERPINA3 as potential auxiliary diagnostic markers. Finally, AlphaFold-based molecular docking analysis suggested these biomarkers may also serve as candidate therapeutic targets.

Preliminary findings indicate that BTD, CFL1, PIGR, and SERPINA3 are vital molecular biomarkers related of CD8+ T cell, providing new insights into the molecular mechanisms and long-term prevention of COVID-19.

Overview of the study design. The participants were divided into two groups: control group (n=265) and COVID-19 group (n=358). Principal component analysis was used to group samples according to clinical characteristics. Selected features were analyzed based on random forest model and SVM-REF model. By integrating scRNA-seq and RNA-seq data with an analysis of the peripheral plasma proteome, we applied machine learning models to successfully identify and predict potential biomarkers associated with CD8+ T cell responses in COVID-19 infection. ROC curve analysis was used to analyze the clinical diagnostic efficacy of inflammatory factors among different groups. The box diagram was used to show the levels of different inflammatory factors in plasma of different populations.

Analysis of clinical features in COVID-19 and control groups uses machine learning for feature extraction. Includes single-cell transcriptome sequencing, transcriptome analysis, proteomics analysis, and machine learning for biomarker identification. Verification steps involve comparative metrics and molecular imagery. Data visualized through charts, scatter plots, and network diagrams.

## Linked entities

- **Genes:** BTD (biotinidase) [NCBI Gene 686], CFL1 (cofilin 1) [NCBI Gene 1072], PIGR (polymeric immunoglobulin receptor) [NCBI Gene 5284], SERPINA3 (serpin family A member 3) [NCBI Gene 12]
- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Genes:** BTD (biotinidase) [NCBI Gene 686], PIGR (polymeric immunoglobulin receptor) [NCBI Gene 5284], SERPINA3 (serpin family A member 3) [NCBI Gene 12] {aka AACT, ACT, GIG24, GIG25}, CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}, CFL1 (cofilin 1) [NCBI Gene 1072] {aka CFL, HEL-S-15, cofilin}
- **Diseases:** deaths (MESH:D003643), COVID-19 (MESH:D000086382)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12528157/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12528157/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12528157/full.md

---
Source: https://tomesphere.com/paper/PMC12528157