Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation
Lingyun Wu, Zhiqiang Hu, Yuanfeng Ji, Ping Luo, Shaoting Zhang

TL;DR
This paper introduces STFT, a multi-frame framework that enhances endoscopic polyp detection by aligning features and adaptively aggregating information across frames, significantly improving localization accuracy.
Contribution
The paper proposes a novel spatial-temporal feature transformation method that effectively addresses inter-frame variations and quality differences in endoscopic videos for better polyp detection.
Findings
STFT improves polyp localization F1-score by over 10% on CVC-Clinic dataset.
STFT outperforms state-of-the-art video-based methods by up to 8%.
The method demonstrates superior stability and effectiveness in challenging endoscopic videos.
Abstract
Precise localization of polyp is crucial for early cancer screening in gastrointestinal endoscopy. Videos given by endoscopy bring both richer contextual information as well as more challenges than still images. The camera-moving situation, instead of the common camera-fixed-object-moving one, leads to significant background variation between frames. Severe internal artifacts (e.g. water flow in the human body, specular reflection by tissues) can make the quality of adjacent frames vary considerately. These factors hinder a video-based model to effectively aggregate features from neighborhood frames and give better predictions. In this paper, we present Spatial-Temporal Feature Transformation (STFT), a multi-frame collaborative framework to address these issues. Spatially, STFT mitigates inter-frame variations in the camera-moving situation with feature alignment by proposal-guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsFeature Pyramid Network · 1x1 Convolution · Convolution · Non Maximum Suppression · FCOS
