VIGIL: Vision-Language Guided Multiple Instance Learning Framework for Ulcerative Colitis Histological Healing Prediction

Zhengxuan Qiu; Bo Peng; Xiaoying Tang; Jiankun Wang; Qin Guo

arXiv:2505.09656·q-bio.QM·May 16, 2025

VIGIL: Vision-Language Guided Multiple Instance Learning Framework for Ulcerative Colitis Histological Healing Prediction

Zhengxuan Qiu, Bo Peng, Xiaoying Tang, Jiankun Wang, Qin Guo

PDF

Open Access

TL;DR

VIGIL is a novel vision-language guided multiple instance learning framework that improves ulcerative colitis histological healing prediction by integrating endoscopic images and diagnostic reports, reducing annotation needs and enhancing accuracy.

Contribution

VIGIL introduces a dual-branch MIL model with image-text alignment and multi-modal fusion, pioneering the combination of vision and language guidance in UC histological prediction.

Findings

01

Achieves 92.69% accuracy and 94.79% AUC on clinical dataset

02

Outperforms existing state-of-the-art methods

03

Reduces annotation burden while improving prediction reliability

Abstract

Objective: Ulcerative colitis (UC), characterized by chronic inflammation with alternating remission-relapse cycles, requires precise histological healing (HH) evaluation to improve clinical outcomes. To overcome the limitations of annotation-intensive deep learning methods and suboptimal multi-instance learning (MIL) in HH prediction, we propose VIGIL, the first vision-language guided MIL framework integrating white light endoscopy (WLE) and endocytoscopy (EC). Methods:VIGIL begins with a dual-branch MIL module KS-MIL based on top-K typical frames selection and similarity metric adaptive learning to learn relationships among frame features effectively. By integrating the diagnostic report text and specially designed multi-level alignment and supervision between image-text pairs, VIGIL establishes joint image-text guidance during training to capture richer disease-related semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Image Processing Techniques and Applications