# Validation and interpretation of machine-learning models for rapid identification of active tuberculosis infection using routine laboratory indicators

**Authors:** Zhan-Zhong Liu, Quan Yuan, Yu-Dong Zhang, Xue-Di Zhang, Jian Liu, Jia-Wei Yan, Kang-Peng Du, Hui-Jin Chen, Liang Wang

PMC · DOI: 10.3389/fcimb.2025.1718614 · Frontiers in Cellular and Infection Microbiology · 2025-12-18

## TL;DR

This study uses machine learning and routine blood tests to quickly and accurately identify active tuberculosis infection, offering a low-cost and accessible diagnostic solution.

## Contribution

A high-performing and interpretable XGBoost model for rapid tuberculosis detection using routine laboratory data.

## Key findings

- XGBoost achieved 97.49% accuracy in the internal cohort and 93.67% accuracy in the external cohort.
- Key predictors included hypoalbuminemia, lipid metabolism suppression, altered platelet activity, and lymphocyte reduction.
- The model provides a non-invasive and cost-effective approach for diagnosing active tuberculosis.

## Abstract

Diagnosis of active Mycobacterium tuberculosis (Mtb) infection relies on clinical symptoms, imaging, and molecular testing, but these methods are often costly and slow. Consequently, there is an urgent need for a rapid and accessible diagnostic approach that can support early detection and reduce ongoing tuberculosis transmission.

A discovery cohort of 3,829 individuals and an external validation cohort of 405 individuals were included. Six supervised machine learning models were trained using routine laboratory data, and model interpretability was assessed with SHapley Additive exPlanations (SHAP).

Among the six models, XGBoost demonstrated the best diagnostic performance in the internal cohort (accuracy 97.49%; sensitivity 97.56%; specificity 97.42%) and maintained strong performance in the external cohort (accuracy 93.67%; sensitivity 91.56%; specificity 91.13%). SHAP analysis indicated that key predictors reflected characteristic host-response patterns, including inflammation-related hypoalbuminemia, lipid metabolism suppression (HDL-C and LDL-C), altered platelet activity (MPV), and lymphocyte reduction (LYM).

The study presents a high-performing and interpretable machine learning model capable of accurately identifying active Mtb infection using routine blood tests. This low-cost and non-invasive approach has strong potential for application in resource-limited and high-burden settings.

Illustration depicting the construction of an optimized XGBoost model. On the left, a study cohort of 3,829 healthy individuals versus active Mtb patients is shown using emoji symbols. Below, images of test tubes labeled as whole blood test and biochemical test. The central section illustrates data input into decision trees labeled Tree 1, Tree 2, and Tree n, with functions f1 to fn-1. The right side features an independent cohort of 405 with accuracy, sensitivity, and specificity percentages given as 93.67%, 91.56%, and 91.13%, respectively.

## Linked entities

- **Diseases:** tuberculosis (MONDO:0018076)
- **Species:** Mycobacterium tuberculosis (taxon 1773)

## Full-text entities

- **Diseases:** Mtb infection (MESH:D014376), inflammation (MESH:D007249), hypoalbuminemia (MESH:D034141)
- **Chemicals:** LDL-C (-), lipid (MESH:D008055)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12756366/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12756366/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12756366/full.md

---
Source: https://tomesphere.com/paper/PMC12756366