# Machine learning models using serum gastric biomarkers for the non-invasive prediction of atrophic gastritis: a comparative study

**Authors:** Dong Li, Haitao Yu, Baihan Jin, Dongfang Dong, Lingxue Cheng, Wenzhu Dong

PMC · DOI: 10.3389/fmed.2026.1757004 · Frontiers in Medicine · 2026-02-18

## TL;DR

This study compares machine learning models to predict atrophic gastritis using blood markers, finding that simple models perform best for ruling out the condition.

## Contribution

The study introduces a novel comparison of multiple machine learning models using serum biomarkers to predict atrophic gastritis, identifying optimal models for clinical triage.

## Key findings

- Elastic Net and Logistic Regression models showed highest AUC (0.823 and 0.810) with strong sensitivity and negative predictive value for ruling out CAG.
- Anti-H. pylori antibody positivity was associated with four-fold higher odds of CAG.
- Simple linear models outperformed complex tree-based models in robustness and clinical utility.

## Abstract

The early, non-invasive detection of chronic atrophic gastritis (CAG), a precancerous lesion, remains a clinical challenge. While serological biomarkers are promising alternatives to endoscopy for screening, their predictive accuracy using conventional methods is suboptimal. This study aimed to identify key predictors of CAG and to comparatively develop multiple machine learning (ML) models, evaluating whether ML offers a definitive advantage and identifying a reliable model for triaging patients to endoscopy.

In this retrospective diagnostic study (conducted from January to October 2020), 222 subjects (CAG prevalence: 30.6%) were stratified randomly into a training set (80%) and an independent test set (20%). Feature selection was performed exclusively on the training set using multivariate logistic regression, which identified four independent predictors: PGI, the PGI/PGII ratio, age, and anti-H. pylori antibody status. Using these predictors, eight models—including Logistic Regression (as baseline), Elastic Net, Support Vector Machine, Neural Network, and tree-based ensembles—were trained and optimized via 5-fold cross-validation. Model performance was rigorously evaluated on the held-out test set using discrimination (AUC, sensitivity, specificity), calibration (Brier score), and clinical utility (Decision Curve Analysis).

Multivariable analysis identified the four predictors, with anti-H. pylori antibody positivity associated with an approximately four-fold higher odds of CAG. On the independent test set, the Elastic Net (AUC = 0.823) and Logistic Regression (AUC = 0.810) models demonstrated the highest and most robust discriminative performance, showing excellent sensitivity (0.923) and negative predictive value (>0.95) for ruling out CAG. Statistical comparison confirmed that their AUCs were significantly higher than those of the severely overfitted tree-based models (e.g., Random Forest), but not significantly different from other complex models like Support Vector Machine. Decision Curve Analysis confirmed the superior net clinical benefit of the Elastic Net and Logistic Regression models across a wide range of decision thresholds.

Simple, interpretable linear models (Elastic Net and Logistic Regression) based on four routine clinical parameters provide a robust tool for the non-invasive identification of CAG in a clinical population referred for endoscopic evaluation. They show particular strength in ruling out disease, supporting their potential role as a triage tool. In this setting, they demonstrated more consistent performance than more complex machine learning algorithms. External validation in broader populations is warranted to confirm generalizability before clinical implementation.

## Linked entities

- **Diseases:** atrophic gastritis (MONDO:0006665), chronic atrophic gastritis (MONDO:0006665)

## Full-text entities

- **Genes:** NFKB1 (nuclear factor kappa B subunit 1) [NCBI Gene 4790] {aka CVID12, EBP-1, KBF1, NF-kB, NF-kB1, NF-kappa-B1}, PGC (progastricsin) [NCBI Gene 5225] {aka PEPC, PGII}, PBX2 (PBX homeobox 2) [NCBI Gene 5089] {aka G17, HOX12, PBX2MHC}, BGN (biglycan) [NCBI Gene 633] {aka DSPG1, MRLS, PG-S1, PGI, SEMDX, SLRR1A}
- **Diseases:** systemic diseases (MESH:D034721), GC (MESH:D013274), atrophy (MESH:D001284), cancer (MESH:D009369), H. pylori infection (MESH:D016481), AG (MESH:D005757), chronic gastric inflammation (MESH:D007249), chronic (MESH:D002908), precancerous lesion (MESH:D011230)
- **Chemicals:** Alcian blue (MESH:D000423), paraffin (MESH:D010232), gold (MESH:D006046), formalin (MESH:D005557), eosin (MESH:D004801), hematoxylin (MESH:D006416), HE (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Helicobacter pylori (species) [taxon 210]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12956625/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12956625/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12956625/full.md

---
Source: https://tomesphere.com/paper/PMC12956625