TL;DR
This paper introduces FPRL, a hierarchical, cognition-inspired framework for endoscopic video analysis that emphasizes static lesion semantics and their evolution, improving representation learning with minimal annotations.
Contribution
FPRL is a novel hierarchical framework that models static and contextual semantics separately, inspired by clinical examination, and demonstrates superior performance on multiple datasets.
Findings
FPRL outperforms existing methods on 11 endoscopic datasets.
It effectively captures static lesion semantics and their evolution.
The code is publicly available at https://github.com/MLMIP/FPRL.
Abstract
Endoscopic video analysis is essential for early gastrointestinal screening but remains hindered by limited high-quality annotations. While self-supervised video pre-training shows promise, existing methods developed for natural videos prioritize dense spatio-temporal modeling and exhibit motion bias, overlooking the static, structured semantics critical to clinical decision-making. To address this challenge, we propose Focus-to-Perceive Representation Learning (FPRL), a cognition-inspired hierarchical framework that emulates clinical examination. FPRL first focuses on intra-frame lesion-centric regions to learn static semantics, and then perceives their evolution across frames to model contextual semantics. To achieve this, FPRL employs a hierarchical semantic modeling mechanism that explicitly distinguishes and collaboratively learns both types of semantics. Specifically, it begins by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
