# Detection for New Biomarkers of Tuberculosis Infection Activity Using Machine Learning Methods

**Authors:** Anna An. Starshinova, Adilya Sabirova, Olesya Koroteeva, Igor Kudryavtsev, Artem Rubinstein, Arthur Aquino, Andrey S. Trulioff, Ekaterina Belyaeva, Anastasia Kulpina, Raul A. Sharipov, Ravil K. Tukfatullin, Nikolay Y. Nikolenko, Anton Mikhalev, Andrey A. Savchenko, Alexandr Borisov, Dmitry Kudlay

PMC · DOI: 10.3390/diseases14020066 · 2026-02-11

## TL;DR

This paper reviews how machine learning and omics data can improve the detection of active tuberculosis by identifying new biomarkers that distinguish it from latent infection.

## Contribution

The paper systematically compares ML-based approaches and identifies translational barriers in TB biomarker research.

## Key findings

- ML-driven analyses outperform traditional tests in diagnosing tuberculosis.
- Multimodal integration improves diagnostic accuracy and robustness.
- qRT-PCR-based biomarker panels show promise for clinical use.

## Abstract

Background/Objectives: Latent tuberculosis infection (LTBI) represents a critical reservoir for subsequent development of active tuberculosis (ATB) and poses significant challenges for early diagnosis and disease prevention. Traditional immunological assays, such as interferon-gamma release assays (IGRAs), are limited in their ability to reliably distinguish LTBI from ATB. Recent advances in high-throughput omics technologies and machine learning (ML) approaches offer new opportunities for precise, biomarker-based differential diagnostics. Methods: Transcriptomic and proteomic profiling of host immune responses has revealed reproducible gene and protein signatures associated with LTBI and ATB. The integration of ML techniques—including feature selection, dimensionality reduction, multimodal learning, and explainable AI—facilitates the construction of robust diagnostic models. Single-modality signatures, derived from RNA-seq, microarrays, or proteomic assays, are complemented by multimodal approaches that incorporate soluble mediators, immunological readouts, and imaging-derived features. Deep learning frameworks, such as convolutional neural networks and transformer-based architectures, enhance the extraction of complex molecular and structural patterns from high-dimensional datasets. Results: ML-driven analyses of transcriptomic and proteomic data consistently outperform conventional immunological tests in terms of sensitivity, specificity, and clinical applicability. Multimodal integration further improves diagnostic accuracy and robustness. These advances support the translational development of concise, quantitative reverse transcription PCR (qRT-PCR)-based biomarker panels suitable for routine clinical application, enabling early and reliable differentiation between LTBI and ATB. Overall, the combination of high-throughput omics and AI-based analytical frameworks provides a promising pathway for enhancing global tuberculosis diagnostics. Conclusions: This review provides a structured and critical synthesis of transcriptomic and proteomic biomarker research for LTBI and ATB discrimination, with a particular emphasis on machine learning–based analytical frameworks. Unlike previous narrative reviews, we systematically compare data-generating platforms, modelling strategies, validation approaches, and sources of heterogeneity across studies. We further identify key translational barriers, including cohort homogeneity, platform dependency, and limited external validation, and propose directions for future research aimed at improving clinical applicability.

## Linked entities

- **Diseases:** tuberculosis (MONDO:0018076), latent tuberculosis infection (MONDO:0040753), active tuberculosis (MONDO:0018076)

## Full-text entities

- **Genes:** IL17A (interleukin 17A) [NCBI Gene 3605] {aka CTLA-8, CTLA8, IL-17, IL-17A, IL17, ILA17}, CXCL10 (C-X-C motif chemokine ligand 10) [NCBI Gene 3627] {aka C7, IFI10, INP10, IP-10, SCYB10, crg-2}, SLC26A8 (solute carrier family 26 member 8) [NCBI Gene 116369] {aka AZON, SPGF3, TAT1}, CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, CCL1 (C-C motif chemokine ligand 1) [NCBI Gene 6346] {aka I-309, P500, SCYA1, SISe, TCA3}, SIRT2 (sirtuin 2) [NCBI Gene 22933] {aka SIR2, SIR2L, SIR2L2}, PC (pyruvate carboxylase) [NCBI Gene 5091] {aka PCB}, EBF3 (EBF transcription factor 3) [NCBI Gene 253738] {aka COE3, EBF-3, HADDS, O/E-2, OE-2}, PDE6B (phosphodiesterase 6B) [NCBI Gene 5158] {aka CSNB3, CSNBAD2, GMP-PDEbeta, PDEB, RP40, rd1}, INTS13 (integrator complex subunit 13) [NCBI Gene 55726] {aka ASUN, C12orf11, GCT1, Mat89Bb, NET48, SPATA30}, SDR39U1 (short chain dehydrogenase/reductase family 39U member 1) [NCBI Gene 56948] {aka C14orf124, HCDI}, GBP5 (guanylate binding protein 5) [NCBI Gene 115362] {aka GBP-5}, PTPRC (protein tyrosine phosphatase receptor type C) [NCBI Gene 5788] {aka B220, CD45, CD45R, GP180, IMD105, L-CA}, TUBB6 (tubulin beta 6 class V) [NCBI Gene 84617] {aka FPVEPD, HsT1601, TUBB-5}, TNF (tumor necrosis factor) [NCBI Gene 7124] {aka DIF, IMD127, TNF-alpha, TNFA, TNFSF2, TNLG1F}, TRMT2A (tRNA methyltransferase 2A) [NCBI Gene 27037] {aka HTF9C}, ADA2 (adenosine deaminase 2) [NCBI Gene 51816] {aka ADGF, CECR1, IDGFL, PAN, SNEDS, VAIHS}, TLR6 (toll like receptor 6) [NCBI Gene 10333] {aka CD286}, CCR4 (C-C motif chemokine receptor 4) [NCBI Gene 1233] {aka CC-CKR-4, CD194, CKR4, CMKBR4, ChemR13, HGCN:14099}, CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}, CD38 (CD38 molecule) [NCBI Gene 952] {aka ADPRC 1, ADPRC1, cADPR1}, CCL2 (C-C motif chemokine ligand 2) [NCBI Gene 6347] {aka GDCF-2, HC11, HSMCR30, MCAF, MCP-1, MCP1}, LIF (LIF interleukin 6 family cytokine) [NCBI Gene 3976] {aka CDF, DIA, HILDA, MLPLI}, TNFRSF10C (TNF receptor superfamily member 10c) [NCBI Gene 8794] {aka CD263, DCR1, DCR1-TNFR, LIT, TRAIL-R3, TRAILR3}, GBP1 (guanylate binding protein 1) [NCBI Gene 2633] {aka hGBP1}, VAMP5 (vesicle associated membrane protein 5) [NCBI Gene 10791], NEMF (nuclear export mediator factor) [NCBI Gene 9147] {aka IDDSAPN, NY-CO-1, RQC2, SDCCAG1}, ATP10A (ATPase phospholipid transporting 10A (putative)) [NCBI Gene 57194] {aka ATP10C, ATPVA, ATPVC}, ANKRD22 (ankyrin repeat domain 22) [NCBI Gene 118932], SERPINC1 (serpin family C member 1) [NCBI Gene 462] {aka AT3, AT3D, ATIII, ATIII-R2, ATIII-T1, ATIII-T2}, CD69 (CD69 molecule) [NCBI Gene 969] {aka AIM, BL-AC/P26, CLEC2C, EA1, GP32/28, MLR-3}, CXCR3 (C-X-C motif chemokine receptor 3) [NCBI Gene 2833] {aka CD182, CD183, CKR-L2, CMKAR3, GPR9, IP10-R}, GBP2 (guanylate binding protein 2) [NCBI Gene 2634], TNFRSF4 (TNF receptor superfamily member 4) [NCBI Gene 7293] {aka ACT35, CD134, IMD16, OX40, TXGP1L}, CSF1 (colony stimulating factor 1) [NCBI Gene 1435] {aka CSF-1, MCSF, PG-M-CSF}, BATF2 (basic leucine zipper ATF-like transcription factor 2) [NCBI Gene 116071] {aka SARI}, ORM1 (orosomucoid 1) [NCBI Gene 5004] {aka A1AG1, AGP-A, AGP1, HEL-S-153w, ORM}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}, FCGR1BP (Fc gamma receptor Ib, pseudogene) [NCBI Gene 2210] {aka CD64b, FCG1, FCGR1, FCGR1B, FcRI, FcgammaRIa}, CD27 (CD27 molecule) [NCBI Gene 939] {aka S152, S152. LPFS2, T14, TNFRSF7, Tp55}, A2ML1 (alpha-2-macroglobulin like 1) [NCBI Gene 144568] {aka CPAMD9, OMS, p170}, IL2 (interleukin 2) [NCBI Gene 3558] {aka IL-2, TCGF, lymphokine}, PGM5 (phosphoglucomutase 5) [NCBI Gene 5239] {aka PGMRP}, ID3 (inhibitor of DNA binding 3) [NCBI Gene 3399] {aka HEIR-1, bHLHb25}, P2RY14 (purinergic receptor P2Y14) [NCBI Gene 9934] {aka BPR105, GPR105, P2Y14}, PLAU (plasminogen activator, urokinase) [NCBI Gene 5328] {aka ATF, BDPLT5, QPD, UPA, URK, u-PA}, UBE2L6 (ubiquitin conjugating enzyme E2 L6) [NCBI Gene 9246] {aka RIG-B, UBCH8}, IL2RA (interleukin 2 receptor subunit alpha) [NCBI Gene 3559] {aka CD25, IDDM10, IL2R, IMD41, TCGFR, p55}, IFNG (interferon gamma) [NCBI Gene 3458] {aka IFG, IFI, IMD69}, CD40LG (CD40 ligand) [NCBI Gene 959] {aka CD154, CD40L, HIGM1, IGM, IMD3, T-BAM}, CXCL9 (C-X-C motif chemokine ligand 9) [NCBI Gene 4283] {aka CMK, Humig, MIG, SCYB9, crg-10}, IFITM3 (interferon induced transmembrane protein 3) [NCBI Gene 10410] {aka 1-8U, DSPA2b, IP15}, DHX29 (DExH-box helicase 29) [NCBI Gene 54505] {aka DDX29}, CDH1 (cadherin 1) [NCBI Gene 999] {aka Arc-1, BCDS1, CD324, CDHE, ECAD, LCAM}, CCR6 (C-C motif chemokine receptor 6) [NCBI Gene 1235] {aka BN-1, C-C CKR-6, CC-CKR-6, CCR-6, CD196, CKR-L3}, KITLG (KIT ligand) [NCBI Gene 4254] {aka DCUA, DFNA69, FPH2, FPHH, KL-1, Kitl}, VEGFA (vascular endothelial growth factor A) [NCBI Gene 7422] {aka L-VEGF, MVCD1, VEGF, VPF}, KLRB1 (killer cell lectin like receptor B1) [NCBI Gene 3820] {aka CD161, CLEC5B, NKR, NKR-P1, NKR-P1A, NKRP1A}, GDNF (glial cell derived neurotrophic factor) [NCBI Gene 2668] {aka ATF, ATF1, ATF2, HFB1-GDNF, HSCR3}, IFNA1 (interferon alpha 1) [NCBI Gene 3439] {aka IFL, IFN, IFN-ALPHA, IFN-alphaD, IFNA13, IFNA@}
- **Diseases:** injury to (MESH:D014947), disease (MESH:D004194), Pulmonary Infections (MESH:D012141), active (OMIM:612348), Inflammatory Conditions (MESH:D007249), granulomatous (MESH:D013968), influenza (MESH:D007251), respiratory diseases (MESH:D012140), viral infections (MESH:D014777), Non-tuberculous mycobacterial (NTM) infections (MESH:D009165), tuberculous lymphadenitis (MESH:D014388), pulmonary TB (MESH:D014397), infected (MESH:D007239), lung cancer (MESH:D008175), sarcoidosis (MESH:D012507), COVID (MESH:D000086382), inflammatory lung diseases (MESH:D008171), immune disturbances (MESH:D007154), pulmonary and extrapulmonary TB disease (MESH:D000092225), LTBI (MESH:D055985), respiratory conditions (MESH:D012131), granulomatous diseases (MESH:D006105), autoimmunity (MESH:D001327), COPD (MESH:D029424), interstitial lung diseases (MESH:D017563), ML (MESH:D007859), inflammatory lung conditions (MESH:D016726), immune-mediated pneumonitis (MESH:D011014), NTM diseases (MESH:D014376), cutaneous TB (MESH:D014382), RD (MESH:D000077733), HIV co-infection (MESH:D015658), immune dysregulation (OMIM:614878), post-COVID (MESH:D000094024), chronic (MESH:D002908), pulmonary sarcoidosis (MESH:D017565), infectious (MESH:D003141)
- **Chemicals:** CFP-10 (-), paraffin (MESH:D010232), lipid (MESH:D008055)
- **Species:** Bacillus sp. CG (species) [taxon 1196795], Mycobacterium tuberculosis (species) [taxon 1773], Mycobacterium tuberculosis variant bovis BCG (no rank) [taxon 33892], Homo sapiens (human, species) [taxon 9606], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Human immunodeficiency virus 1 (no rank) [taxon 11676]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12939407/full.md

---
Source: https://tomesphere.com/paper/PMC12939407