SALI: Short-term Alignment and Long-term Interaction Network for Colonoscopy Video Polyp Segmentation
Qiang Hu, Zhenyu Yi, Ying Zhou, Fang Peng, Mei Liu, Qiang Li, and, Zhiwei Wang

TL;DR
The SALI network enhances colonoscopy video polyp segmentation by combining short-term spatial alignment and long-term memory modules, improving robustness and accuracy in challenging low-quality frames.
Contribution
This paper introduces the SALI network, integrating SAM and LIM modules for improved robustness in video polyp segmentation, addressing spatial incoherence and low-quality frame challenges.
Findings
Outperforms current state-of-the-art methods on SUNSEG benchmark.
Improves Dice coefficient by up to 4.1% on test subsets.
Demonstrates robustness to spatial variations and low-visual cues.
Abstract
Colonoscopy videos provide richer information in polyp segmentation for rectal cancer diagnosis. However, the endoscope's fast moving and close-up observing make the current methods suffer from large spatial incoherence and continuous low-quality frames, and thus yield limited segmentation accuracy. In this context, we focus on robust video polyp segmentation by enhancing the adjacent feature consistency and rebuilding the reliable polyp representation. To achieve this goal, we in this paper propose SALI network, a hybrid of Short-term Alignment Module (SAM) and Long-term Interaction Module (LIM). The SAM learns spatial-aligned features of adjacent frames via deformable convolution and further harmonizes them to capture more stable short-term polyp representation. In case of low-quality frames, the LIM stores the historical polyp representations as a long-term memory bank, and explores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection · Radiomics and Machine Learning in Medical Imaging
MethodsFocus · Deformable Convolution · Linear Layer · Segment Anything Model · Convolution · Transformer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing
