AVPDN: Learning Motion-Robust and Scale-Adaptive Representations for Video-Based Polyp Detection

Zilin Chen; Shengnan Lu

arXiv:2508.03458·cs.CV·August 6, 2025

AVPDN: Learning Motion-Robust and Scale-Adaptive Representations for Video-Based Polyp Detection

Zilin Chen, Shengnan Lu

PDF

TL;DR

This paper introduces AVPDN, a novel deep learning framework that enhances video-based polyp detection in colonoscopy videos by addressing motion and scale variations through specialized modules, leading to improved accuracy and robustness.

Contribution

The paper presents AVPDN, a new network with adaptive feature interaction and scale-aware modules specifically designed for robust polyp detection in challenging colonoscopy videos.

Findings

01

Achieves state-of-the-art performance on public benchmarks.

02

Demonstrates strong generalization across different datasets.

03

Effectively handles rapid camera movements and multi-scale features.

Abstract

Accurate detection of polyps is of critical importance for the early and intermediate stages of colorectal cancer diagnosis. Compared to static images, dynamic colonoscopy videos provide more comprehensive visual information, which can facilitate the development of effective treatment plans. However, unlike fixed-camera recordings, colonoscopy videos often exhibit rapid camera movement, introducing substantial background noise that disrupts the structural integrity of the scene and increases the risk of false positives. To address these challenges, we propose the Adaptive Video Polyp Detection Network (AVPDN), a robust framework for multi-scale polyp detection in colonoscopy videos. AVPDN incorporates two key components: the Adaptive Feature Interaction and Augmentation (AFIA) module and the Scale-Aware Context Integration (SACI) module. The AFIA module adopts a triple-branch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.