# AI-powered three-category Helicobacter pylori diagnosis via magnetic controlled capsule endoscopy: a multicenter validation of a vision-language model

**Authors:** Xi Sun, Jing Liu, Lili Wu, Xiao Chen, Xiaona Ma, Fei Teng, Ting Zhang, Hui Su, Xin Fan, Jiaxin Li, Shiping Xu, Peng Jin, Hongmei Jiao

PMC · DOI: 10.3389/fmicb.2025.1687021 · 2025-10-13

## TL;DR

A new AI model called MC-CLIP improves Helicobacter pylori diagnosis using capsule endoscopy, outperforming human experts in accuracy and reliability.

## Contribution

MC-CLIP is a vision-language model that enables fully automated three-category H. pylori diagnosis with high accuracy and sensitivity.

## Key findings

- MC-CLIP achieved 89.6% accuracy in internal validation and 86.6% in external validation for H. pylori diagnosis.
- The model outperformed senior and junior endoscopists in detecting current and past H. pylori infections.
- MC-CLIP showed high specificity and excelled at identifying subtle mucosal changes after eradication therapy.

## Abstract

Accurate classification of Helicobacter pylori (H. pylori) infection status is critical for gastric cancer risk stratification. Current methods based on traditional convolutional neural networks (CNNs) are limited by their reliance on fragmented single-image analysis and operator-dependent selection variability, impairing diagnostic reliability.

To overcome these limitations, we developed MC-CLIP, a vision-language foundation model for the fully automated, three-categorical diagnosis of H. pylori infection using magnetically controlled capsule endoscopy (MCCE). The model was first pretrained on a large-scale dataset of 2,427,475 MCCE image-text pairs derived from 123,543 examinations. It was subsequently fine-tuned on 40,695 expertly annotated images from 864 patients. MC-CLIP autonomously selects 30 representative images per case for end-to-end classification. Its performance was rigorously evaluated on multicenter internal (n = 220) and external (n = 208) validation cohorts.

On the internal and external validation cohorts, MC-CLIP achieved overall accuracies of 89.6% (95% CI: 85.5–93.6%) and 86.6% (80.8–90.3%), respectively. The model demonstrated particularly high sensitivity in detecting H. pylori infection: 91.4% for current infection and 83.7% for past infection. This performance significantly surpassed that of both senior endoscopists (84.3% and 71.4%, respectively) and junior endoscopists (74.3% for current infection). MC-CLIP also maintained high specificity (>90% across all categories) and excelled at identifying subtle mucosal changes following eradication therapy, thereby reducing the misclassification of past infections as non-infections.

By integrating multimodal image-text data and performing end-to-end analysis, MC-CLIP effectively addresses the fundamental limitations of CNN-based approaches. The model shows strong potential for enhancing the accuracy and reliability of MCCE-based gastric cancer screening programs.

## Linked entities

- **Diseases:** gastric cancer (MONDO:0001056)

## Full-text entities

- **Diseases:** infection (MESH:D007239), gastric cancer (MESH:D013274), H. pylori infection (MESH:D016481)
- **Chemicals:** MC (MESH:C061001)
- **Species:** Homo sapiens (human, species) [taxon 9606], Helicobacter pylori (species) [taxon 210]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12554734/full.md

---
Source: https://tomesphere.com/paper/PMC12554734