# Machine-Learning-Derived, Mechanistically Informed Transcriptomic Signature to Diagnose Active Tuberculosis and Guide Host-Directed Therapy

**Authors:** Asif Hassan Syed, Nashwan Alromema, Hatem A. Almazarqi, Jasrah Irfan, Shakeel Ahmad, Altyeb A. Taha, Alhuseen Omar Alsayed

PMC · DOI: 10.3390/diagnostics16050693 · Diagnostics · 2026-02-26

## TL;DR

This paper introduces a four-gene signature that can diagnose active tuberculosis and identify key biological pathways for targeted treatment.

## Contribution

A novel transcriptomic signature with both diagnostic and mechanistic insights for active tuberculosis.

## Key findings

- The four-gene signature (TAP2, SORT1, WARS, ANKRD22) is highly specific for active TB with strong diagnostic performance (ROC-AUC = 0.991).
- The signature maps to key host pathways involved in active TB, including antigen presentation and interferon response.
- The signature was validated in an independent cohort (GSE19444) and aligns with WHO performance targets for TB triage.

## Abstract

Background/Objectives: An important diagnostic problem is to differentiate between active tuberculosis (TB) and latent TB infection (LTBI). Furthermore, the current biomarkers also offer minimal insight into disease pathogenesis to direct treatment. This triggered us to design a two-mode biomarker signature based on the multicohort analysis using a transcriptomic and stringent machine learning pipeline. Methods: When analyzing active TB, latent TB, and healthy control samples, a rigorous filter (ANOVA, p < 0.001) was used, followed by the selection of features with the help of Boruta-XGBoost and LASSO regression. This determined a small four-gene signature (TAP2, SORT1, WARS, and ANKRD22), which was selectively and highly upregulated in the active TB clinical state (p < 0.001). An ensemble staking classifier based on this signature (Random Forest and XGBoost) had a very high diagnostic performance (ROC-AUC = 0.991 (95% CI: 0.983–0.997)) in the stratification of infection phases, which was strongly confirmed in another cohort (GSE19444). Results: Importantly, the analysis of the functional pathways showed that all the genes are mapped to core dysregulated host pathways in active TB: antigen presentation (TAP2), lipid trafficking (SORT1), interferon response (WARS), and inflammasome signaling (ANKRD22). In such a way, the signature has a dual advantage: (1) high specificity, non-sputum transcriptional diagnostic of active TB, and (2) a mechanistic map of key host pathways, which describes targets of intervention. Conclusions: Thus, the signature provides a two-fold response: a biomarker panel aligned with WHO performance targets for TB triage and a mechanistic plan of therapy, which provides an easy way to implement transcriptomic discovery into clinical action against TB.

## Linked entities

- **Genes:** TAP2 (transporter 2, ATP binding cassette subfamily B member) [NCBI Gene 6891], SORT1 (sortilin 1) [NCBI Gene 6272], WARS1 (tryptophanyl-tRNA synthetase 1) [NCBI Gene 7453], ANKRD22 (ankyrin repeat domain 22) [NCBI Gene 118932]
- **Diseases:** tuberculosis (MONDO:0018076), active tuberculosis (MONDO:0018076)

## Full-text entities

- **Genes:** TAP2 (transporter 2, ATP binding cassette subfamily B member) [NCBI Gene 6891] {aka ABC18, ABCB3, APT2, D6S217E, MHC1D2, PSF-2}, SORT1 (sortilin 1) [NCBI Gene 6272] {aka Gp95, LDLCQ6, NT3, NTR3}, ANKRD22 (ankyrin repeat domain 22) [NCBI Gene 118932], WARS1 (tryptophanyl-tRNA synthetase 1) [NCBI Gene 7453] {aka GAMMA-2, HMN9, HMND9, IFI53, IFP53, NEDMSBA}
- **Diseases:** Active Tuberculosis (MESH:D014376), infection (MESH:D007239), LTBI (MESH:D055985), latent (MESH:D000085343)
- **Chemicals:** lipid (MESH:D008055)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12985208/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12985208/full.md

## References

78 references — full list in the complete paper: https://tomesphere.com/paper/PMC12985208/full.md

---
Source: https://tomesphere.com/paper/PMC12985208