VIGIL: Vision-Language Guided Multiple Instance Learning Framework for Ulcerative Colitis Histological Healing Prediction
Zhengxuan Qiu, Bo Peng, Xiaoying Tang, Jiankun Wang, Qin Guo

TL;DR
VIGIL is a novel vision-language guided multiple instance learning framework that improves ulcerative colitis histological healing prediction by integrating endoscopic images and diagnostic reports, reducing annotation needs and enhancing accuracy.
Contribution
VIGIL introduces a dual-branch MIL model with image-text alignment and multi-modal fusion, pioneering the combination of vision and language guidance in UC histological prediction.
Findings
Achieves 92.69% accuracy and 94.79% AUC on clinical dataset
Outperforms existing state-of-the-art methods
Reduces annotation burden while improving prediction reliability
Abstract
Objective: Ulcerative colitis (UC), characterized by chronic inflammation with alternating remission-relapse cycles, requires precise histological healing (HH) evaluation to improve clinical outcomes. To overcome the limitations of annotation-intensive deep learning methods and suboptimal multi-instance learning (MIL) in HH prediction, we propose VIGIL, the first vision-language guided MIL framework integrating white light endoscopy (WLE) and endocytoscopy (EC). Methods:VIGIL begins with a dual-branch MIL module KS-MIL based on top-K typical frames selection and similarity metric adaptive learning to learn relationships among frame features effectively. By integrating the diagnostic report text and specially designed multi-level alignment and supervision between image-text pairs, VIGIL establishes joint image-text guidance during training to capture richer disease-related semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Image Processing Techniques and Applications
