VISTA: Validation-Guided Integration of Spatial and Temporal Foundation Models with Anatomical Decoding for Rare-Pathology VCE Event Detection
Bo-Cheng Qiu, Yu-Fan Lin, Yu-Zhe Pien, Chia-Ming Lee, Fu-En Yang, Yu-Chiang Frank Wang, Chih-Chung Hsu

TL;DR
This paper introduces VISTA, a novel framework for detecting rare pathological events in capsule endoscopy videos by integrating multiple models and anatomical constraints to improve event-level accuracy.
Contribution
The paper proposes a new validation-guided, multi-backbone framework with anatomical decoding for improved rare-pathology event detection in noisy, long videos.
Findings
Achieved a temporal [email protected] of 0.3530 on the test set.
Validation-guided fusion improves event detection accuracy.
Anatomical constraints enhance temporal event stability.
Abstract
Capsule endoscopy event detection is challenging because diagnostically relevant findings are sparse, visually heterogeneous, and embedded in long, noisy video streams, while evaluation is performed at the event level rather than by frame accuracy alone. We therefore formulate the RARE-VISION task as a metric-aligned event detection problem instead of a purely frame-wise classification task. Our framework combines two complementary backbones, EndoFM-LV for local temporal context and DINOv3 ViT-L/16 for strong frame-level visual semantics, followed by a Diverse Head Ensemble, Validation-Guided Hierarchical Fusion, and Anatomy-Aware Temporal Event Decoding. The fusion stage uses validation-derived class-wise model weighting, backbone weighting, and probability calibration, while the decoding stage applies temporal smoothing, anatomical constraints, threshold refinement, and per-label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection · Gastrointestinal Bleeding Diagnosis and Treatment · AI in cancer detection
