Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
Zhao Wang, Chang Liu, Shaoting Zhang, Qi Dou

TL;DR
This paper introduces Endo-FM, a large-scale self-supervised foundation model for endoscopic video analysis, leveraging extensive data and a video transformer to improve performance across multiple downstream tasks.
Contribution
It develops a novel self-supervised pre-training approach for endoscopic videos using a large dataset and a video transformer architecture, filling a gap in foundation models for this domain.
Findings
Outperforms state-of-the-art self-supervised methods on downstream tasks
Uses over 33,000 video clips with 5 million frames for pre-training
Achieves significant improvements in classification, segmentation, and detection metrics
Abstract
Foundation models have exhibited remarkable success in various applications, such as disease diagnosis and text report generation. To date, a foundation model for endoscopic video analysis is still lacking. In this paper, we propose Endo-FM, a foundation model specifically developed using massive endoscopic video data. First, we build a video transformer, which captures both local and global long-range dependencies across spatial and temporal dimensions. Second, we pre-train our transformer model using global and local views via a self-supervised manner, aiming to make it robust to spatial-temporal variations and discriminative across different scenes. To develop the foundation model, we construct a large-scale endoscopy video dataset by combining 9 publicly available datasets and a privately collected dataset from Baoshan Branch of Renji Hospital in Shanghai, China. Our dataset overall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection
