Metagenomic and metatranscriptomic profiling of bronchoalveolar lavage fluid identifies microbial and host biomarkers of drug-resistant tuberculosis
Haiqing Zhang, Limao Zhang, Bin Yang, Chunjing Gao, Huimei Liu, Yunshi Zhang, Xiaoyou Chen

TL;DR
This study uses metagenomic and transcriptomic data from lung fluid to identify microbial and immune markers that distinguish drug-resistant from drug-susceptible tuberculosis.
Contribution
The study introduces a dual-omics approach to uncover DR-TB-specific microbial and host signatures, including an 8-gene classifier for accurate discrimination.
Findings
DR-TB airways show distinct microbial diversity with enriched Streptococcus spp. and specific phages.
DR-TB is associated with 494 differentially expressed genes linked to metabolic and immune pathways.
An 8-gene host signature accurately differentiates DR-TB from DS-TB with an AUC of 0.837.
Abstract
Drug-resistant tuberculosis (DR-TB) undermines global TB control, yet how resistant Mycobacterium tuberculosis strains interact with the lung microbiome, phage communities, and local host immunity remains poorly defined. In a prospective cohort of 130 pulmonary TB patients (49 DR-TB, 81 drug-susceptible TB [DS-TB] patients), bronchoalveolar lavage fluid (BALF) was subjected to paired metagenomic and transcriptomic profiling. Microbial and bacteriophage community structures were assessed by diversity metrics and differential abundance testing, whereas host responses were characterized by gene expression, pathway enrichment, and immune cell deconvolution. A Random Forest model was trained to evaluate the diagnostic potential of host transcriptional signatures. DR-TB airways presented distinct microbial beta diversity, with enrichment of Streptococcus spp. and streptococcal-targeting…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTuberculosis Research and Epidemiology · Diagnosis and treatment of tuberculosis · Mycobacterium research and diagnosis
Introduction
1
Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), remains a leading cause of morbidity and mortality worldwide. The rise of drug-resistant TB (DR-TB) poses a critical threat to global TB control, undermining treatment efficacy and worsening clinical outcomes. While adaptive immunity in TB has been extensively studied, early host–pathogen interactions at the respiratory mucosal surface and their modulation by the local microbiome are increasingly recognized as key determinants of infection dynamics (Lange et al., 2019; Walker et al., 2022). Antibiotic therapy itself induces profound and persistent dysbiosis, particularly in barrier sites such as the lung, which may compromise immune homeostasis and increase susceptibility to reinfection.
The lower respiratory tract (LRT), once considered sterile, is now known to harbor a low-biomass but immunologically relevant microbial community (bacteriome, virome) distinct from that of the upper airway (Hong et al., 2016). These commensals dynamically interact with airway epithelial cells (AECs) and local immune populations, including γδ T cells, NK cells, and MAIT cells, thereby shaping the early containment of Mtb and bridging to adaptive immunity (Gupta et al., 2018; Howard and Khader, 2020). Importantly, drug-resistance–conferring mutations in Mtb (e.g., rpoB mutations) alter cell wall lipid composition and bacterial physiology, potentially influencing interactions with both the microbiome and host immune pathways. Emerging evidence links TB susceptibility and progression to perturbations in the gut–lung axis and the pulmonary microbiome (Gupta et al., 2018). Indeed, recent studies have demonstrated that integrating host transcriptional profiling with respiratory metagenomics can significantly increase the diagnostic accuracy for lower respiratory tract infections (Mick et al., 2023; Zou et al., 2025b). However, a critical knowledge gap persists: the interplay between drug-resistant Mtb strains, the lung microbiome-including bacteriophage and local immune-metabolic responses at the airway interface remains largely unexplored (Lin et al., 2021; Valdez-Palomares et al., 2021; Fonseca et al., 2025).
To address this gap, we systematically collected bronchoalveolar lavage fluid (BALF) from a prospective cohort of pulmonary TB patients and applied a dual-omics strategy integrating metagenomic profiling of microbial and phage communities with host transcriptomic analysis. This approach enabled us to identify DR-TB–specific microbial and phage signatures, identify host immune–metabolic pathways associated with resistance, and evaluate the diagnostic potential of host transcriptional biomarkers. By directly interrogating the distal airway niche, our study provides novel insights into the ecological and immunological basis of DR-TB pathogenesis and highlights potential avenues for host- and microbiome-directed interventions.
Materials and methods
2
Study design, patient cohort, and ethics statement
2.1
From December 2024 to May 2025, we enrolled 130 patients with confirmed pulmonary tuberculosis (TB) in a prospective observational cohort in Jiangsu Province. A definitive diagnosis required bacteriological or molecular confirmation (Figure 1): (1) Acid-fast bacilli (AFB) smear positivity in sputum, bronchoalveolar lavage fluid (BALF), or pleural fluid; (2) positive culture of Mtb complex; or (3) molecular testing (e.g., GeneXpert MTB/RIF) detecting Mtb DNA. Crucially, to ensure accurate stratification into DR-TB and drug-sensitive TB (DS-TB) groups, only patients with definitive phenotypic drug susceptibility testing (DST) results or molecular confirmation of drug resistance were included in the final cohort. Patients diagnosed solely on clinical grounds without microbiological evidence of drug susceptibility profiles were excluded from this study.
Study design integrating phenotypic drug resistance confirmation with dual-omics profiling of BALF. Flowchart depicting study procedure: (upper panel) Pulmonary tuberculosis patients were stratified into drug-resistant TB (DR-TB) or drug-sensitive TB (DS-TB) groups through a triangulated phenotyping strategy that combined culture-based antimicrobial susceptibility testing, Xpert MTB/RIF Ultra, and nucleic acid mass spectrometry. According to WHO criteria, isolates resistant to ≥1 first- or second-line drug were classified as DR-TB, whereas isolates susceptible to all tested drugs were defined as DS-TB. (lower panel) From the same bronchoalveolar lavage fluid (BALF) specimens, paired DNA and RNA were extracted for metagenomic and metatranscriptomic library preparation. High-throughput sequencing and standardized bioinformatics pipelines enabled integrative characterization of the airway microbiome (including bacteriophages) and host transcriptional responses within the same anatomical niche.
Confirmed Mtb isolates were subjected to DST via commercial microplate dilution kits in 7H9 broth, and their sensitivities to 11 anti-TB drugs (ATDs) at both low and high concentrations were tested. Kits were provided by Autobio Diagnostics (2018–2019) and Encode Medical Engineering (2020 onward). Testing followed the manufacturer’s protocols and CLSI guidelines. The inoculum size was standardized at 200 µL/well. Cultures were checked for contamination after 7 days, with final readings at 12–13 days if uncontaminated. The drugs tested included four first-line agents-isoniazid (0.20, 0.80 μg/ml), rifampicin (4.0, 8.0 μg/ml), streptomycin (4.0, 8.0 μg/ml), and ethambutol (2.5, 5.0 μg/ml)-and seven second-line agents: levofloxacin (2.0, 8.0 μg/ml), moxifloxacin (0.5, 2.0 μg/ml), para-aminosalicylic acid (PAS; 2.0, 8.0 μg/ml; system updated Oct 2019), rifabutin (0.75, 3.0 μg/ml), amikacin (1.0, 4.0 μg/ml), kanamycin (2.5, 10.0 μg/ml), and capreomycin (2.5, 10.0 μg/ml). The susceptible H37Rv strain (ATCC 27294) served as a control, with interlaboratory testing for quality assurance.
The GeneXpert MTB/RIF assay detected Mtb and rifampicin resistance: samples were mixed 1:2 with Xpert reagent, incubated at room temperature for 15 min, and processed on the GeneXpert system. Failed or indeterminate samples were retested or excluded.
For PCR/MALDI-TOF MS, nucleic acid extraction was performed under BSL-2/3 conditions. Colonies from solid or liquid cultures (0.5 McFarland) were suspended in PrepMan Ultra, vortexed, heat-killed (90-100°C, 30 min), centrifuged, and supernatants stored. Extracts were diluted 1:10 before testing. A multiplex PCR amplified 24 target regions in 10 genes, followed by cleanup and dephosphorylation. A second multiplex PCR enabled single-base extension with detection probes. Extension products were resin-cleaned (Chip Prep Module) and spotted onto matrix-loaded chips for MALDI-TOF MS analysis (MassARRAY system).
Patients were stratified into drug-resistant TB (DR-TB; n = 49, resistant to ≥1 first- or second-line drug) and drug-sensitive TB (DS-TB; n = 81, pan-susceptible) exclusively based on rigorous phenotypic classification. All data originated from archived clinical materials at Xuzhou Infectious Disease Hospital under a no-contact protocol approved by the Institutional Review Board (IRB Approval No.: XCYPJ-2024112603-01). Written informed consent was obtained for sampling and secondary research use of specimens.
DNA/RNA extraction, library construction, and sequencing
2.2
For metagenomic DNA sequencing (mNGS-DNA), 1 mL of BALF was subjected to host nucleic acid depletion with 1 U of benzonase (Sigma) and 0.5% Tween-20 (Sigma) and incubated at 37°C for 5 min. Six hundred microliters of the mixture was transferred to tubes containing 500 µL of ceramic beads for bead beating (Minilys Homogener, Tiangen, China). Nucleic acids from 400 µL pretreated samples were extracted via a QIAamp UCP Pathogen Mini Kit (Qiagen, Germany) and eluted in 60 µL of buffer. For DNA quantification, a Qubit dsDNA high sensitivity assay (Invitrogen, USA) was used.
For metatranscriptomic RNA sequencing (mNGS-RNA), 1 mL of BALF was centrifuged at 12,000 rpm for 10 min. Pellets were lysed in TRIzol LS (Thermo Fisher Scientific, USA), followed by RNA extraction via a Direct-zol RNA Miniprep Kit (Zymo Research, USA).
Libraries were prepared from 30 µL of DNA via the Nextera DNA Flex Kit (Illumina, USA) and from 10 µL of RNA via the Ovation Trio RNA-Seq Kit (NuGEN, USA). Library concentrations were measured via the Qubit dsDNA HS assay, and quality was assessed via an Agilent 2100 Bioanalyzer with a high-sensitivity DNA kit. Sequencing was performed on an MGI G99 platform (single-end, 75 bp). The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2025) (Zhang et al., 2025) in National Genomics Data Center (Nucleic Acids Res 2025), China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA034880) that are publicly accessible at https://ngdc.cncb.ac.cn/gsa.
Microbial annotation, community structure comparison, and differential taxon analysis
2.3
Microbial composition analysis followed a validated mNGS pipeline (Diao et al., 2023; Han et al., 2025; Lin et al., 2025; Sun et al., 2025; You et al., 2025): Briefly, Trimmomatic was used to remove low-quality and duplicate reads as well as adapter sequences (Bolger et al., 2014); Reads were mapped to the human reference genome (hg38) using SNAP v1.0beta to exclude host-derived sequences (Miller et al., 2019); and ribosomal RNA sequences were filtered with SortMeRNA v4.3.7. Taxonomic profiles were generated using Kraken2 v2.0.7 and Bracken v2.5 with default reference databases (https://benlangmead.github.io/aws-indexes/k2) (Diao et al., 2023; Ho et al., 2023). To correct for sequencing depth, microbial reads were normalized as reads per million (RPM). Potential contaminants were identified by comparing taxon abundance between bronchoscope background controls and negative controls (PBS or sterile water). Taxa with median relative abundance >50% higher in background controls or with mean relative abundance >0.1% in negative controls were excluded as contaminants.
Phage annotation was performed by BLAST alignment (word size = 18, e-value = 0.0005, culling limit = 1) against a curated phage database (CPD) comprising 26,159 representative genomes (Haddock et al., 2023). Phage contig quality was further assessed, and microbial/phage abundances were expressed as relative abundances (Haddock et al., 2023).
α-diversity indices (Shannon, Simpson, Chao1, ACE) for both DNA- and RNA-based profiles were calculated using the vegan R package (Ren et al., 2021). β-diversity was assessed using Bray-Curtis dissimilarity and compared between groups by PERMANOVA (adonis, 999 permutations, vegan package). Principal coordinates analysis (PCoA) was used for visualization of community structure. Differentially abundant taxa and bacteriophages were identified using linear discriminant analysis effect size (LEfSe), with LDA score > 2 and FDR-adjusted p < 0.05 considered significant. HUMAnN2 software (Truong et al., 2015; Beghini et al., 2021), which based on metagenomic data, was applied for analyzing the abundance of microbial pathways, and then displayed by Kyoto Encyclopedia of Genes and Genomes (KEGG) (Caspi et al., 2020).
Gene expression, transposable element expression, and cell-type composition analysis
2.4
Host transcriptome data were aligned to the human reference genome (hg38) using HISAT2 with default settings, and gene-level quantification was performed with FeatureCounts (Subread package v2.0.0 (http://subread.sourceforge.net/)) (Liao et al., 2014; Kim et al., 2019). Trimmed reads were mapped using STAR (predefined parameters). Normalization and batch effect correction were applied prior to differential expression analysis with DESeq2 (Benjamini–Hochberg FDR ≤ 0.05; |log_2_ fold change| ≥ 1) (Love et al., 2014). Differentially expressed genes (DEGs) were subjected to gene set enrichment analysis (GSEA) via the REACTOME, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO) databases (Subramanian et al., 2005; Kanehisa et al., 2017; Gillespie et al., 2022). Cell-type composition was inferred from bulk transcriptomes using CIBERSORT with the LM22 signature matrix (1,000 permutations) (Steen et al., 2020). Nonparametric comparisons of continuous variables (latent variables and cell fractions) between groups were performed with the Mann-Whitney U test, and multiple testing was corrected by the Benjamini-Hochberg method (adjusted p < 0.05 considered significant).
Model construction for drug-resistant tuberculosis classification
2.5
An ensemble machine learning framework was developed to distinguish drug-resistant TB (DR-TB) from drug-sensitive TB (DS-TB) using BALF RNA sequencing-derived host transcriptomic signatures. Differential expression was first assessed using DESeq2 (median-of-ratios normalization; variance-stabilizing transformation) (Segata et al., 2011), and genes with p < 0.05 were retained as candidate biomarkers; batch effects were corrected using ComBat-seq. To account for possible confounders, we conducted multivariate regression analysis using clinical data of patients (Supplementary Table S1) as covariates. Genes with a significance level of p<0.05 in this adjusted model were retained for constructing the classifier. Subsequently, a Random Forest (RF) classifier (ranger package) was then trained after sequential forward feature selection with 1,000 bootstrap iterations, and to address class imbalance, the majority class was undersampled in each iteration. Hyperparameters were optimized by nested stratified 10-fold cross-validation with mtry (candidate features per split) tuned across [√n, n/3], ntree (number of trees) varied from 500-2,000, and minimal node size determined by minimizing Gini impurity. Model performance was evaluated on an independent 30% hold-out set and reported as area under the ROC curve (AUC), sensitivity, specificity, and F1-score, with 95% confidence intervals estimated via bootstrapping, and calibration was assessed by the Brier score. Model interpretability was achieved using SHapley Additive exPlanations (SHAP; shap Python library v0.41.0 via reticulate), where global feature importance was summarized as mean absolute SHAP values, sample-level contributions were visualized with force plots, and dependence plots illustrated the relationship between gene expression values (color-coded, high = yellow and low = purple) and their contribution to predictions. All analyses were performed in R v4.1.0 with DESeq2 v1.32.0, mlr3 v0.13.0, ranger v0.13.1, and reticulate v1.22.
Contamination control
2.6
Negative controls (PBS or sterile water) were processed in parallel with BALF specimens. In addition, sterile cotton swabs moistened with sterile deionized water were used to wipe the surfaces of the centrifuge and biosafety cabinet, generating a background microorganism profile for our laboratory. Potential contaminant taxa were identified by prevalence filtering (decontam v1.20.0) and excluded.
Results
3
Microbial composition and diversity of DR-TB vs. DS-TB
3.1
We profiled 130 patients (Supplementary Table S1, Supplementary Table S2) with active pulmonary tuberculosis (TB) confirmed by both clinical and microbiological criteria. Based on drug susceptibility testing (DST), the cohort included 81 cases of drug-susceptible TB (DS-TB) and 49 cases of drug-resistant TB (DR-TB). Within the DR-TB group, 33 patients had isoniazid monoresistance (INH-MR), 12 had rifampicin monoresistance (RFP-MR), 26 met criteria for multidrug-resistant TB (MDR-TB), and 39 were classified as polyresistant TB (PR-TB).
Metagenomic DNA data generated a median of 16.7 million clean reads (IQR 13.6-20.5 million) per patient, with microbial reads accounting for 10.55% (IQR 4.45-33.72%, Supplementary Table S3) after host filtering. Metagenomic profiling of BALF microbiota, including bacteria, eukaryotes, and viruses, revealed a trend toward higher alpha diversity in DR-TB compared with DS-TB, although the difference did not reach statistical significance (Figure 2A; p > 0.05). In contrast, beta diversity analysis based on Bray-Curtis dissimilarity demonstrated significant compositional separation between groups (Figure 2B; PERMANOVA, p < 0.05). LEfSe identified taxa discriminating the two phenotypes. As expected, Mtb was significantly enriched in DR-TB samples (log10 LDA score = 4.63, p = 0.017; Figure 2C), accompanied by increased abundance of Prevotella melaninogenica and Streptococcus spp. In contrast, Human alphaherpesvirus 1, Fusobacterium pseudoperiodonticum, and Leptotrichia spp. were more abundant in DS-TB samples (Figure 2C, Supplementary Figures S1A, B). Correlation analysis further revealed a broadly negative association between Mtb abundance and other commensal taxa (Supplementary Figure S1C; p < 0.05), suggesting competitive exclusion or ecological disruption in the DR-TB airway niche.
*Comparative microbiome profiles in drug-resistant versus drug-sensitive TB. (A) Alpha diversity metrics (Shannon index, Simpson index, richness, evenness, Chao1 and ACE) comparing drug-resistant (red) and drug-sensitive (blue) groups. (B) Beta diversity analysis based on Bray–Curtis dissimilarity, visualized by principal coordinates analysis (PCoA). Group separation was statistically supported by PERMANOVA (R2 = 0.018, p = 0.013). (C) LEfSe identifying discriminative taxa enriched in either group. Only taxa with LDA score > 2 and p-value < 0.05 are displayed. (D) Differentially abundances microbial functional pathways. Features highlighted in red demonstrate denote functions significantly enriched in drug-resistant samples, several of which are associated with Mycobacterium tuberculosis metabolic adaptation and host interaction. *p < 0.05; **p < 0.01; **p < 0.001; N.S., not significant.
Functional profiling of metabolic pathways demonstrated distinct differences between DR-TB and DS-TB isolates (Figure 2D). DR-TB strains exhibited significantly higher abundance of enzymes critical for intracellular survival and virulence, including L-asparaginase (asparagine utilization), a starch-binding outer membrane protein (nutrient uptake and membrane integrity), and UDP-3-O-[3-hydroxymyristoyl] N-acetylglucosamine deacetylase (lipid A biosynthesis). Conversely, DS-TB strains were enriched in enzymes implicated in host-pathogen interactions and cell wall remodeling, such as hexosaminidase and dipeptidyl-peptidase 4 (Figure 2D).
Bacteriophage composition and diversity of DR-TB vs. DS-TB
3.2
Bacteriophage profiling revealed Bacteriophage sp. as the dominant taxon across all samples, followed by Mycolicibacterium phage J1 and Escherichia phage Lambda evo17 (Figure 3A). Hierarchical clustering based on relative abundance demonstrated distinct community patterns segregating DR-TB (blue) and DS-TB (red) groups. DR-TB samples were enriched in multiple streptococcal-associated phages, including Streptococcus satellite phage variants (e.g., phi-Ssu5SJ28rum, Javan440, IPP35) and Erysipelothrix phage phi1805, whereas DS-TB samples exhibited higher abundance of Vibrio phage JSF6 and Acinetobacter phage TAC1 (Figure 3A). Principal coordinates analysis (PCoA) of Bray-Curtis dissimilarity suggested partial separation along PCoA1 (29% variance) and PCoA2 (15% variance), although PERMANOVA did not reach statistical significance (R² = 0.0098, p = 0.205; Figure 3B). Notably, alpha diversity was significantly higher in DR-TB compared with DS-TB, as indicated by increased ACE and Chao1 indices (p < 0.05; Supplementary Figure S2).
Comparative bacteriophage profiles in drug-resistant versus drug-sensitive TB. Comparative analysis of bacteriophages features between drug-resistant and drug-sensitive isolates. (A) Bar plot display the proportional distribution of major phage taxa across individual BALF samples. (B) Beta diversity (PCoA) between drug resistant (red) and drug sensitive (blue) (PERMANOVA: R2 = 0.0098, p = 0.205). (C) LEfSe highlighting phage taxa enriched in either group. Only taxa with LDA score > 2 and p-value < 0.05 are shown. Circle size denotes relative abundance.
LEfSe analysis further confirmed discriminative phage taxa between groups (Figure 3C). DR-TB samples were characterized by enrichment of diverse Streptococcus phages (e.g., phi-Ssu5SJ28rum, phi-SsuHCJ3rum, Javan440, Javan345, Javan336, and phage 20617), satellite phages (Javan437, Javan360, Javan309, Javan305), and Erysipelothrix phage phi1805, all with high positive log10 LDA scores. In contrast, Vibrio phage JSF6 and Acinetobacter phage TAC1 were preferentially enriched in DS-TB samples (Figure 3C). These distinct phage community signatures indicate phenotype-specific airway viromes and suggest potential contributions to microbiome dysbiosis, horizontal gene transfer, or modulation of bacterial fitness in the DR-TB niche (Figures 3A-C).
Host transcription profiles: identification of differentially expressed genes
3.3
RNA sequencing (Supplementary Table S4) identified 494 DEGs between DR-TB and DS-TB samples (|log_2_FC| ≥ 1, p < 0.05), including 39 upregulated and 455 downregulated in DR-TB (Figure 4A). Representative genes are shown in Figures 4C-H, where ARHGEF5 (Rho Guanine Nucleotide Exchange Factor 5), IGFBP6 (insulin like growth factor binding protein 6), PTGES3L (Prostaglandin E Synthase 3-Like), and BDP1 (BDP1 general transcription factor IIIB subunit) were significantly upregulated, whereas ENPP1 (ectonucleotide pyrophosphatase/phosphodiesterase 1), RANBP17 (RAN Binding Protein 17) and so on were downregulated in DR-TB relative to DS-TB (all p < 0.05). Pathway analyses revealed coherent transcriptional reprogramming: Reactome analysis indicated enrichment of collagen biosynthesis and extracellular matrix remodeling pathways (adjusted p < 0.05; Figure 4B); GO analysis showed downregulation of processes linked to transmembrane transport and ion channel activity (adjusted p < 0.05; Supplementary Figure S2A); and KEGG analysis highlighted significant alterations in “Neuroactive ligand-receptor interaction” (adjusted p < 0.01) and the “Hippo signaling pathway” (adjusted p < 0.05; Supplementary Figure S2B). GSEA further demonstrated upregulation of oxidative phosphorylation (OxPhos)-related pathways-including ATP synthesis–coupled electron transport, respiratory chain complex assembly, and mitochondrial respirasome function (adjusted p < 0.05; Supplementary Figure S2C)-indicating enhanced mitochondrial energy metabolism in DR-TB, accompanied by suppressed signaling related to ion transporters and channel activity.
*Differential host transcriptomic profiles in drug-resistant versus drug-sensitive TB. Comparative transcriptomic analysis between drug-resistant (red) and drug-sensitive (blue) conditions. (A) Volcano plot of differentially expressed genes (DEGs). Genes with |log2 fold change| ≥1 and adjusted p-value < 0.05 were considered significant. (B) Reactome pathway enrichment analysis of DEGs, highlighting pathways related to extracellular matrix remodeling and collagen biosynthesis. Pathways with adjusted p-value < 0.05 were retained. (C–H) Expression distributions of representative DEGs. ELFN1 was downregulated in drug-resistant TB, while GAL3ST1, IGFBP6, INSC, ANKRD1, and PPP1R3C were upregulated. These genes formed part of the diagnostic signature identified in downstream machine-learning analyses. *p < 0.05; *p < 0.01; N.S., not significant.
Computational immune deconvolution (CIBERSORT) of BALF transcriptomes revealed broadly comparable immune cell composition between DR-TB and DS-TB patients (Figure 5A). Neutrophils constituted the largest fraction (~34.5% DR-TB vs. ~34.4% DS-TB), followed by M0 macrophages (16.0% vs. 12.9%), CD4^+^ resting memory T cells (9.0% vs. 10.4%), M1 macrophages (~5.6% vs. ~5.3%), and activated CD4^+^ memory T cells (~5.9% vs. ~6.9%). No significant differences were observed in monocytes, M0 macrophages, or regulatory T cells (p > 0.05; Figures 5B–D). In contrast, γδ T cells were markedly reduced in DR-TB compared with DS-TB (0.14% vs. 0.74%, p = 0.0059; Figure 5E), highlighting a selective deficiency in this innate-like T cell subset that may contribute to impaired early immune containment of drug-resistant infection.
Comparative immune cell composition in drug-resistant vs. versus drug-sensitive conditions. (A) Relative proportions of innate and adaptive immune cell subsets inferred from bulk transcriptomes by CIBERSORT. Circle size denotes the estimated fraction of each cell type, and color intensity reflects mean abundance within each group. (B–E) Distribution of selected immune cell subsets showing group-level differences: macrophages M1 (B), monocytes (C), regulatory T cells (Tregs) (D), and γδ T cells (E). Drug-resistant samples are denoted in red and drug-sensitive samples in blue. Among these subsets, γδ T cells were significantly reduced in the drug-resistant group (p = 0.0059), whereas other populations showed no significant differences.
Diagnostic model for DR-TB discrimination
3.4
To evaluate the diagnostic potential of host transcriptional signatures, we constructed a Random Forest classifier based on BALF-derived DEGs. An optimal 8-gene panel including ARHGEF5, PTGES3L, GAL3ST1 (Galactose-3-O-Sulfotransferase 1), RANBP17 (RAN Binding Protein 17), ACTA2_AS1 (ACTA2 Antisense RNA 1), CBY3 (Chibby Family Member 3), MAMSTR (MEF2 Activating Motif and SAP Domain Containing Transcriptional Regulator), and LOC102031319 (Uncharacterized Locus 102031319)) discriminated DR-TB from DS-TB with high accuracy, achieving an AUC of 0.837 (95% CI: 0.815-0.86; Figure 6A). Model interpretability analysis using SHAP values ranked gene contributions to classification (Figure 6B). ARHGEF5 exerted the strongest influence (mean |SHAP| = 0.111; 25.37% contribution), followed by PTGES3L (0.069; 15.76%) and GAL3ST1 (0.069; 15.65%). Notably, higher expression of ARHGEF5, PTGES3L and GAL3ST1 (purple points) was consistently associated with accurate DR-TB prediction, whereas lower expression (yellow points) weakened predictive performance.
Diagnostic performance and feature importance of predictive model. (A) ROC curve of the Random Forest classifier distinguishing drug-resistant (DR-TB) from drug-sensitive TB (DS-TB) using the eight-gene transcriptional signature. The model achieved an AUC of 0.837 (95% CI: 0.815–0.86) under stratified cross-validation. Shaded area indicates the 95% confidence interval. (B) Feature importance ranked by SHAP values. Gray bars represent the average contribution of each gene, and scattered points indicate individual sample-level SHAP values, with color denoting feature expression (yellow: high, purple: low). ARHGEF5, PTGES3L, GAL3ST1, and RANBP17 contributed most strongly to model discrimination, consistent with their differential expression between groups.
Discussion
4
To the best of our knowledge, this study represents the first comprehensive dual-omics analysis of BALF from patients with DR-TB and DS-TB. By simultaneously characterizing the lung microbiome-including bacteria, eukaryotes, viruses, and bacteriophages-via metagenomics, and the host immune-metabolic response via transcriptomics within the same anatomical compartment, we uncovered previously unrecognized pathogen-host dynamics in DR-TB. The principal innovations of this work include: (i) identification of DR-TB-specific microbial signatures, notably enrichment of Streptococcus species and streptococcal-targeting phages (e.g., Javan variants, phi-Ssu5SJ28rum), implicating antibiotic-driven dysbiosis and phage-mediated interactions in DR-TB pathogenesis; (ii) demonstration that host transcriptional profiles in BALF can discriminate DR-TB from DS-TB, marked by upregulated oxidative phosphorylation (OxPhos), suppression of ion channel and transporter activity, and depletion of γδ T cells; and (iii) development of an 8-gene BALF-based diagnostic classifier (including ARHGEF5, PTGES3L, GAL3ST1, RANBP17, ACTA2_AS1), which achieved an AUC of 0.837, highlighting the utility of host-derived biomarkers beyond conventional bacterial genotyping approaches. Together, these findings establish the distal airway as a critical compartment for DR-TB pathobiology and suggest new opportunities for host- and microbiome-directed interventions.
The microbial ecology of DR-TB displayed distinct adaptations relative to DS-TB. While alpha diversity did not differ significantly between groups, beta diversity was clearly distinct, consistent with niche specialization of resistant Mtb strains (Valdez-Palomares et al., 2021; Du et al., 2024). DR-TB samples exhibited consistent enrichment of Streptococcus species, a genus previously linked to TB treatment failure and recurrence (Wu et al., 2013; Shahzad et al., 2024). Enrichment of Streptococcus has also been reported in other respiratory conditions, such as community-acquired pneumonia (Yang et al., 2024b) and influenza infection (Zou et al., 2025a), where it often correlates with acute inflammation or secondary bacterial coinfection. However, unlike these acute settings, the expansion of Streptococcus in our DR-TB cohort likely reflects antibiotic-induced dysbiosis under chronic treatment pressure. This specific niche may not only represent a marker of drug resistance but could also facilitate M. tuberculosis persistence through metabolic cross-feeding or biofilm formation, distinct from the immune-stimulatory role observed in acute viral-bacterial coinfections (Zou et al., 2025a). This enrichment creates a permissive environment for streptococcal expansion and potentially facilitating Mtb persistence via interbacterial interactions. Importantly, phage communities were skewed toward Streptococcus-targeting phages in DR-TB, whereas DS-TB samples harbored more diverse phages such as Vibrio phage JSF6 and Acinetobacter phage TAC1. Given the role of phages in horizontal gene transfer and bacterial fitness modulation, such imbalances may reinforce the ecological advantages of resistant strains. Recent studies underscore the therapeutic potential of phages in mycobacterial infections, with engineered or humanized phages reducing bacterial burden in animal models (Yang et al., 2024a) and compassionate use successfully treating M. abscessus infections in humans (Azimi et al., 2019; Dedrick et al., 2019). Our findings extend this paradigm by linking phage composition to DR-TB specifically, suggesting the airway phageome as both a biomarker and a therapeutic target.
Host transcriptional responses revealed profound rewiring of metabolic and immune pathways in DR-TB. Upregulation of OxPhos-related genes indicates enhanced mitochondrial activity, in contrast to the glycolytic reprogramming typically adopted by macrophages during early Mtb infection to fuel antimicrobial responses (Cook et al., 2017; Shi et al., 2019). This persistent OxPhos dominance may reflect metabolic inflexibility that supports bacterial persistence. In parallel, suppression of ion channel and transporter activity-particularly calcium signaling via TRPV4 (Naik et al., 2020; Mitini-Nkhoma et al., 2021)-may compromise phagosome-lysosome fusion and autophagy, essential for intracellular bacterial control. We also observed enrichment of extracellular matrix (ECM) remodeling pathways, consistent with prior reports that excessive collagen deposition shapes granuloma architecture and promotes fibrosis (Marshall et al., 1996; Al Shammari et al., 2015) (Ladel et al., 1995). Together, these alterations suggest that DR-TB is characterized by both metabolic maladaptation and tissue remodeling that reinforce pathogen survival and tissue pathology.
Immunologically, the most striking finding was the depletion of γδ T cells in DR-TB. γδ T cells are pivotal in recognizing Mtb phosphoantigens independently of MHC presentation, producing IFN-γ and TNF-α to activate macrophages, and directly lysing infected cells (D’Souza et al., 1997; Hu et al., 2023). Their reduction in DR-TB aligns with impaired early containment and poorer treatment outcomes, consistent with reports that adoptive transfer of γδ T cells improves responses in MDR-TB patients (Liang et al., 2021). This highlights γδ T cell augmentation as a rational host-directed therapeutic strategy. Among the diagnostic gene signature, ARHGEF5 strongly linked to immunity through regulation of dendritic cell migration via RhoA activation, crucial for innate immune responses (Wang et al., 2009). PTGES3L may integrate oxidative DNA damage signals with immune gene activation, potentially modulating macrophage phagocytosis and inflammation (de Vries et al., 2021). GAL3ST1 promotes immune evasion in cancers through HIF1-sulfatide axes and correlates with immune cell infiltration, suggesting anti-infection potential (Díaz-Alvarez and Ortega, 2017). RANBP17 plays a key role in anti-viral defense via nucleocytoplasmic transport (Mandic et al., 2022). ACTA2-AS1 potentially regulates immune responses in infections, as pathogenic mycoplasmas alter similar lncRNAs for immune modulation and chromatin functions (Ramos et al., 2023). Their inclusion suggests that immune dysregulation in DR-TB extends beyond innate lymphocytes to broader interferon and antigen-presenting pathways, reinforcing the potential of immune-focused biomarkers.
From a translational perspective, the development of an 8-gene BALF-derived signature achieving an AUC of 0.837 represents a promising step toward host-based diagnostics. Unlike conventional genotypic resistance testing, which depends on sputum quality and prior drug exposure, host-response signatures may provide complementary information on treatment-resistant disease. However, external validation in independent cohorts with standardized clinical outcomes will be essential before clinical deployment. Integration with rapid sputum-based assays could further enhance diagnostic performance (Hamada et al., 2022; Parwati et al., 2022; Fonseca et al., 2025).
This study has several limitations. First, the diagnostic classifier was derived computationally from a single-center cohort. Although we implemented a rigorous undersampling strategy to address the class imbalance (49 DR-TB vs. 81 DS-TB) and minimize bias, the lack of an independent external validation cohort remains a major limitation. Consequently, the reported diagnostic performance may reflect cohort-specific features, and the model’s generalizability to populations with different DR-TB prevalence rates warrants further verification in multi-center prospective studies. Second, the cross-sectional design precludes causal inference between microbiome alterations, host responses, and the emergence of drug resistance. Longitudinal sampling before and after treatment initiation would clarify temporal dynamics. Third, while BALF provides a direct window into the airway niche, it may not fully capture systemic immune responses or the gut-lung axis, both of which are increasingly recognized in TB (Alvarado-Peña et al., 2023; Enjeti et al., 2023). Future work should integrate multi-compartmental sampling and functional assays (e.g., γδ T cell functional rescue, phage transduction experiments) to dissect causal mechanisms.
In summary, DR-TB is characterized by synergistic microbial and host alterations, including streptococcal enrichment and phage dysbiosis, metabolic inflexibility with OxPhos dominance, suppression of ion transport pathways, and γδ T cell depletion. Together with an emerging BALF-based host transcriptional signature, these findings provide a foundation for microbiome- and host-directed diagnostics and interventions in tuberculosis drug resistance.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Al Shammari B. Shiomi T. Tezera L. Bielecka M. K. Workman V. Sathyamoorthy T. . (2015). The extracellular matrix regulates granuloma necrosis in tuberculosis. J. Infect. Dis. 212, 463–473. doi: 10.1093/infdis/jiv 076, PMID: 25676469 PMC 4539912 · doi ↗ · pubmed ↗
- 2Alvarado-Peña N. Galeana-Cadena D. Gómez-García I. A. Mainero X. S. Silva-Herzog E. (2023). The microbiome and the gut-lung axis in tuberculosis: interplay in the course of disease and treatment. Front. Microbiol. 14. doi: 10.3389/fmicb.2023.1237998, PMID: 38029121 PMC 10643882 · doi ↗ · pubmed ↗
- 3Azimi T. Mosadegh M. Nasiri M. J. Sabour S. Karimaei S. Nasser A. (2019). Phage therapy as a renewed therapeutic approach to mycobacterial infections: a comprehensive review. Infect. Drug Resist. 12, 2943–2959. doi: 10.2147/idr.S 218638, PMID: 31571947 PMC 6756577 · doi ↗ · pubmed ↗
- 4Beghini F. Mc Iver L. J. Blanco-Míguez A. Dubois L. Asnicar F. Maharjan S. . (2021). Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bio Bakery 3. Elife 10, e 65088. doi: 10.7554/e Life.65088, PMID: 33944776 PMC 8096432 · doi ↗ · pubmed ↗
- 5Bolger A. M. Lohse M. Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. doi: 10.1093/bioinformatics/btu 170, PMID: 24695404 PMC 4103590 · doi ↗ · pubmed ↗
- 6Caspi R. Billington R. Keseler I. M. Kothari A. Krummenacker M. Midford P. E. . (2020). The Meta Cyc database of metabolic pathways and enzymes - a 2019 update. Nucleic Acids Res. 48, D 445–d 453. doi: 10.1093/nar/gkz 862, PMID: 31586394 PMC 6943030 · doi ↗ · pubmed ↗
- 7Cook G. M. Hards K. Dunn E. Heikal A. Nakatani Y. Greening C. . (2017). Oxidative phosphorylation as a target space for tuberculosis: success, caution, and future directions. Microbiol. Spectr. 5, 1–22. doi: 10.1128/microbiolspec.TBTB 2-0014-2016, PMID: 28597820 PMC 5480969 · doi ↗ · pubmed ↗
- 8D’Souza C. D. Cooper A. M. Frank A. A. Mazzaccaro R. J. Bloom B. R. Orme I. M. (1997). An anti-inflammatory role for gamma delta T lymphocytes in acquired immunity to Mycobacterium tuberculosis. J. Immunol. 158, 1217–1221. doi: 10.4049/jimmunol.158.3.1217, PMID: 9013962 · doi ↗ · pubmed ↗
