# Urine volatile organic compounds profiling via GC-IMS combined with machine learning: a powerful diagnostic and pathogen differentiation tool for urinary tract infections

**Authors:** Xin Zheng, Xiaohang Sun, Wenjing Du, Shoulin Sun, Dongge Chen, Wen Cheng, Xuewei Zhuang, Yanli Zhang

PMC · DOI: 10.3389/fcimb.2026.1745468 · Frontiers in Cellular and Infection Microbiology · 2026-02-11

## TL;DR

This study shows that analyzing urine for volatile compounds with machine learning can accurately diagnose urinary tract infections and identify the type of bacteria involved.

## Contribution

The novel contribution is combining GC-IMS VOC profiling with clinical data and machine learning to achieve high UTI diagnostic accuracy and pathogen differentiation.

## Key findings

- A Random Forest model integrating VOCs and clinical features achieved an AUC of 0.914 for UTI diagnosis.
- Acetic acid and benzaldehyde were identified as strong independent predictors of UTI.
- VOC profiles enabled moderate discrimination between Gram-positive and Gram-negative infections (AUC 0.800).

## Abstract

The diagnostic delay associated with standard urine culture necessitates rapid, accurate alternatives for urinary tract infection (UTI) management. Volatile organic compounds (VOCs) emitted by microbes represent a promising source of metabolic biomarkers for infection diagnosis.

To develop and validate a diagnostic model for UTI by integrating urine VOCs profiles obtained via gas chromatography-ion mobility spectrometry (GC-IMS) with clinical features using machine learning.

We conducted a prospective cohort study of 258 adults with suspected UTI. Clean-catch midstream urine samples were collected for clinical urinalysis, culture (reference standard), and GC-IMS-based VOCs analysis. VOCs and clinical data were used to train and test machine learning models (Logistic Regression, Random Forest, Support Vector Machine). Model performance was assessed by area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and decision curve analysis.

Among 258 enrolled patients, 152 (58.9%) were culture-positive. We identified 11 differentially expressed VOCs between infected and non-infected groups, with acetic acid, benzaldehyde, and furan being the most significant (Bonferroni-adjusted p < 0.05). A Random Forest model integrating both VOCs and clinical features demonstrated superior performance (AUC of 0.914, with an accuracy of 82.1% (95% CI: 71.8-89.8%), sensitivity of 87.0%, specificity of 75.0%, and an F1-score of 0.851) compared to models using clinical-only (AUC 0.831) or VOC-only (AUC 0.850). Multivariate analysis confirmed acetic acid (OR 3.27) and benzaldehyde (OR 4.95) as strong independent predictors of UTI. Furthermore, VOCs profiles allowed moderate discrimination between Gram-positive and Gram-negative bacterial infections (AUC 0.800) and exhibited pathogen-specific patterns.

The integration of urine VOCs profiles obtained by GC-IMS with routine clinical parameters using machine learning achieves high diagnostic accuracy for UTI and shows potential for rapid pathogen differentiation. This strategy could improve UTI diagnostics, enabling faster, more precise antibiotic therapy.

## Linked entities

- **Chemicals:** acetic acid (PubChem CID 176), benzaldehyde (PubChem CID 240), furan (PubChem CID 8029)
- **Diseases:** urinary tract infection (MONDO:0005247)

## Full-text entities

- **Genes:** CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, ESBL [NCBI Gene 13906541], UROD (uroporphyrinogen decarboxylase) [NCBI Gene 7389] {aka PCT, UPD}
- **Diseases:** hypertension (MESH:D006973), infected (MESH:D007239), UTI (MESH:D014552), bacterial infections (MESH:D001424), diabetic ketosis (MESH:D016883), Gram-negative rod infection (MESH:D016905), HIV infection (MESH:D015658), infectious diseases (MESH:D003141), sepsis (MESH:D018805), inflammation (MESH:D007249), respiratory infections (MESH:D012141), suprapubic pain (MESH:D010146), diabetes (MESH:D003920), bacteriuria (MESH:D001437), pneumonia (MESH:D011014), fever (MESH:D005334), dysuria (MESH:D053159)
- **Chemicals:** furan (MESH:C039281), Acid3 (-), nitrite (MESH:D009573), toluene (MESH:D014050), acetone (MESH:D000096), ATP (MESH:D000255), alcohol (MESH:D000438), acetate (MESH:D000085), short-chain fatty acid (MESH:D005232), acrylate (MESH:C036658), succinate (MESH:D019802), pyruvate (MESH:D019289), aromatic amino acid (MESH:D024322), nitrogen (MESH:D009584), Benzaldehyde (MESH:C032175), Cyclohexanone (MESH:C036468), 2-methyl-1-propanol (MESH:C040507), ketone (MESH:D007659), phenol (MESH:D019800), Propanoic acid (MESH:C029658), acetyl-CoA (MESH:D000105), 4-methyl-2-pentanone (MESH:C005458), Acetic acid (MESH:D019342), VOC (MESH:D055549)
- **Species:** Staphylococcus aureus (species) [taxon 1280], Proteus mirabilis (species) [taxon 584], Homo sapiens (human, species) [taxon 9606], Escherichia coli (E. coli, species) [taxon 562], Pseudomonas aeruginosa (species) [taxon 287], Klebsiella pneumoniae (species) [taxon 573], Enterococcus (genus) [taxon 1350]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12932616/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12932616/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12932616/full.md

---
Source: https://tomesphere.com/paper/PMC12932616