# A machine learning model and molecular clusters of epigenetic chromatin regulators in tuberculosis based on bioinformatics and clinical samples

**Authors:** Huawei He, Liuying Wei, Lanwei Nong, Beibei Gong, Chaoyan Xu, Qingdong Zhu

PMC · DOI: 10.1038/s41598-025-25858-9 · 2025-11-25

## TL;DR

This study explores how chromatin regulators help diagnose tuberculosis and identify disease subtypes using machine learning and clinical data.

## Contribution

A novel XGBoost model and five-gene signature for TB subtyping and a potential biomarker (IFIT3) are identified.

## Key findings

- 15 differentially expressed chromatin regulators were identified and used to classify TB patients into two molecular clusters.
- The XGBoost model achieved high accuracy (AUC = 0.965) in distinguishing TB subtypes.
- IFIT3 was validated as a potential biomarker for tuberculosis in blood samples.

## Abstract

The role of chromatin regulators (CRs) in mediating epigenetic changes during tuberculosis (TB) infection remains poorly understood. This study aimed to determine the efficacy of CRs in diagnosing TB and characterizing its heterogeneity. GSE83456 dataset was analyzed to identify differentially expressed CRs (DE-CRs) and immune cell infiltration in patients with TB. Consensus clustering was used to classify patients with TB based on DE-CR expression patterns. The optimal machine learning model was selected from four algorithms (Random Forest (RF), Support Vector Machine (SVM), Generalized Linear Model (GLM), and eXtreme Gradient Boosting (XGB)) to differentiate between the molecular clusters. Validation was performed using an external dataset (GSE152532). Blood samples were collected from healthy individuals and patients with pulmonary TB (PTB) or tuberculous meningitis (TBM). Analysis identified 15 DE-CRs, which were used to stratify patients with TB into two distinct molecular clusters exhibiting divergent immune microenvironment characteristics. The XGB model exhibited superior performance in distinguishing these clusters (area under the receiver operating characteristic curve = 0.965). From this model, a five-gene signature (DHRS9, HIST1H2BK, C16orf74, SLC30A1, and GBP1) was identified. This signature effectively predicted TB subtypes and was significantly associated with active TB (ATB) in an external validation set. Clinically, IFIT3 expression was validated as being significantly elevated in the blood of patients with TB (including PTB and TBM) compared to healthy controls, thereby confirming its potential role as a pan-TB biomarker. Our study revealed that CRs are closely associated with immune infiltration and heterogeneity in TB. We developed a robust XGBoost model based on a five-gene signature for accurate TB subtyping and disease-status assessment. Elevated IFIT3 expression underscores the value of CRs as novel biomarkers for TB diagnosis.

The online version contains supplementary material available at 10.1038/s41598-025-25858-9.

## Linked entities

- **Genes:** DHRS9 (dehydrogenase/reductase 9) [NCBI Gene 10170], H2BC12 (H2B clustered histone 12) [NCBI Gene 85236], CLMB (calcimembrin) [NCBI Gene 404550], SLC30A1 (solute carrier family 30 member 1) [NCBI Gene 7779], GBP1 (guanylate binding protein 1) [NCBI Gene 2633], IFIT3 (interferon induced protein with tetratricopeptide repeats 3) [NCBI Gene 3437]
- **Diseases:** tuberculosis (MONDO:0018076), pulmonary TB (MONDO:0006052), tuberculous meningitis (MONDO:0006042), active TB (MONDO:0100481)

## Full-text entities

- **Genes:** IFIT3 (interferon induced protein with tetratricopeptide repeats 3) [NCBI Gene 3437] {aka CIG-49, GARG-49, IFI60, IFIT4, IRG2, ISG60}, H2BC12 (H2B clustered histone 12) [NCBI Gene 85236] {aka H2B/S, H2BFAiii, H2BFT, H2BK, HIST1H2BK}, GBP1 (guanylate binding protein 1) [NCBI Gene 2633] {aka hGBP1}, DHRS9 (dehydrogenase/reductase 9) [NCBI Gene 10170] {aka 3-alpha-HSD, 3ALPHA-HSD, RDH-TBE, RDH15, RDHL, RDHTBE}, SLC30A1 (solute carrier family 30 member 1) [NCBI Gene 7779] {aka ZNT1, ZRC1}, CLMB (calcimembrin) [NCBI Gene 404550] {aka ASRA, C16orf74, MICT1}
- **Diseases:** PTB (MESH:D014397), active (OMIM:612348), ATB (MESH:D014376), TBM (MESH:D014390)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12647805/full.md

---
Source: https://tomesphere.com/paper/PMC12647805