# Retrospective cohort study of Helicobacter pylori infection and risk stratification using 6-year UBT data

**Authors:** Yan Chen, Miaojuan Wang, Jianfeng Wang

PMC · DOI: 10.3389/fpubh.2025.1563841 · Frontiers in Public Health · 2025-05-26

## TL;DR

This study uses machine learning to predict Helicobacter pylori infection risk based on metabolic markers and UBT data, identifying a high-risk subgroup for early intervention.

## Contribution

A novel machine learning-based risk prediction model for H. pylori infection using longitudinal UBT and metabolic data is proposed.

## Key findings

- Chronic H. pylori infection is associated with elevated HbA1c, LDL-C, and WBC levels.
- The XGBoost model achieved an AUC of 0.6809 and 81.13% accuracy in predicting infection risk.
- A 4.0% high-risk subgroup was identified, suggesting potential for early intervention strategies.

## Abstract

Helicobacter pylori (H. pylori) infection is a major global health concern, linked to gastric cancer and metabolic disorders. Despite its widespread prevalence, accurate risk stratification remains challenging. This study aims to develop a machine learning (ML)-based risk prediction model using 6-year longitudinal Urea Breath Test (UBT) data to identify metabolic alterations associated with chronic H. pylori infection.

A retrospective cohort study was conducted using health examination data from 3,409 individuals between 2016 and 2021. Participants were stratified into H. pylori-positive and negative groups based on longitudinal UBT results. Key metabolic markers, including HbA1c, LDL-C, BMI, and WBC, were analyzed. Three predictive models—logistic regression, random forest, and XGBoost—were compared to assess their predictive performance.

Among the cohort, 20.5% exhibited chronic H. pylori infection. Infected individuals had significantly higher HbA1c (+1.2%, p < 0.01), LDL-C (+15 mg/dL, p < 0.05), and WBC levels, alongside lower albumin (−0.8 g/dL, p < 0.01). The XGBoost model outperformed others (AUC = 0.6809, Accuracy = 81.13%) in predicting infection risk. A subgroup of 4.0% was identified as high-risk, highlighting the potential for early intervention.

This study underscores the interplay between chronic H. pylori infection and metabolic dysfunction, offering new perspectives on risk prediction using machine learning. The XGBoost model demonstrated reliable performance in stratifying infection risk based on accessible clinical markers. Its integration into routine screening protocols could enhance early detection and personalized intervention strategies. Further studies should validate these findings across broader populations and incorporate additional risk factors.

## Linked entities

- **Diseases:** gastric cancer (MONDO:0001056)
- **Species:** Helicobacter pylori (taxon 210)

## Full-text entities

- **Diseases:** metabolic disorders (MESH:D008659), infection (MESH:D007239), gastric cancer (MESH:D013274)
- **Chemicals:** LDL-C (-), Urea (MESH:D014508)
- **Species:** Helicobacter pylori (species) [taxon 210]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12146378/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12146378/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12146378/full.md

---
Source: https://tomesphere.com/paper/PMC12146378