TSdetector: Temporal-Spatial Self-correction Collaborative Learning for Colonoscopy Video Detection
Kaini Wang, Haolin Wang, Guang-Quan Zhou, Yangang Wang, Ling Yang,, Yang Chen, Shuo Li

TL;DR
TSdetector introduces a novel temporal-spatial self-correction framework for improved polyp detection in colonoscopy videos, leveraging global temporal features and spatial relationship modeling to enhance accuracy and reliability.
Contribution
The paper presents a new detection model combining temporal consistency and spatial relationship learning, addressing intra-sequence heterogeneity and confidence discrepancies in colonoscopy videos.
Findings
Achieves highest polyp detection rate on three datasets.
Outperforms existing state-of-the-art methods.
Effectively reduces redundant bounding boxes.
Abstract
CNN-based object detection models that strike a balance between performance and speed have been gradually used in polyp detection tasks. Nevertheless, accurately locating polyps within complex colonoscopy video scenes remains challenging since existing methods ignore two key issues: intra-sequence distribution heterogeneity and precision-confidence discrepancy. To address these challenges, we propose a novel Temporal-Spatial self-correction detector (TSdetector), which first integrates temporal-level consistency learning and spatial-level reliability learning to detect objects continuously. Technically, we first propose a global temporal-aware convolution, assembling the preceding information to dynamically guide the current convolution kernel to focus on global features between sequences. In addition, we designed a hierarchical queue integration mechanism to combine multi-temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus · Convolution
