Focus-to-Perceive Representation Learning: A Cognition-Inspired Hierarchical Framework for Endoscopic Video Analysis

Yuan Zhang; Sihao Dou; Kai Hu; Shuhua Deng; Chunhong Cao; Fen Xiao; Xieping Gao

arXiv:2603.25778·cs.CV·March 30, 2026

Focus-to-Perceive Representation Learning: A Cognition-Inspired Hierarchical Framework for Endoscopic Video Analysis

Yuan Zhang, Sihao Dou, Kai Hu, Shuhua Deng, Chunhong Cao, Fen Xiao, Xieping Gao

PDF

1 Repo

TL;DR

This paper introduces FPRL, a hierarchical, cognition-inspired framework for endoscopic video analysis that emphasizes static lesion semantics and their evolution, improving representation learning with minimal annotations.

Contribution

FPRL is a novel hierarchical framework that models static and contextual semantics separately, inspired by clinical examination, and demonstrates superior performance on multiple datasets.

Findings

01

FPRL outperforms existing methods on 11 endoscopic datasets.

02

It effectively captures static lesion semantics and their evolution.

03

The code is publicly available at https://github.com/MLMIP/FPRL.

Abstract

Endoscopic video analysis is essential for early gastrointestinal screening but remains hindered by limited high-quality annotations. While self-supervised video pre-training shows promise, existing methods developed for natural videos prioritize dense spatio-temporal modeling and exhibit motion bias, overlooking the static, structured semantics critical to clinical decision-making. To address this challenge, we propose Focus-to-Perceive Representation Learning (FPRL), a cognition-inspired hierarchical framework that emulates clinical examination. FPRL first focuses on intra-frame lesion-centric regions to learn static semantics, and then perceives their evolution across frames to model contextual semantics. To achieve this, FPRL employs a hierarchical semantic modeling mechanism that explicitly distinguishes and collaboratively learns both types of semantics. Specifically, it begins by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MLMIP/FPRL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.