Foundation Model for Endoscopy Video Analysis via Large-scale   Self-supervised Pre-train

Zhao Wang; Chang Liu; Shaoting Zhang; Qi Dou

arXiv:2306.16741·cs.CV·January 10, 2024·1 cites

Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train

Zhao Wang, Chang Liu, Shaoting Zhang, Qi Dou

PDF

Open Access 1 Repo

TL;DR

This paper introduces Endo-FM, a large-scale self-supervised foundation model for endoscopic video analysis, leveraging extensive data and a video transformer to improve performance across multiple downstream tasks.

Contribution

It develops a novel self-supervised pre-training approach for endoscopic videos using a large dataset and a video transformer architecture, filling a gap in foundation models for this domain.

Findings

01

Outperforms state-of-the-art self-supervised methods on downstream tasks

02

Uses over 33,000 video clips with 5 million frames for pre-training

03

Achieves significant improvements in classification, segmentation, and detection metrics

Abstract

Foundation models have exhibited remarkable success in various applications, such as disease diagnosis and text report generation. To date, a foundation model for endoscopic video analysis is still lacking. In this paper, we propose Endo-FM, a foundation model specifically developed using massive endoscopic video data. First, we build a video transformer, which captures both local and global long-range dependencies across spatial and temporal dimensions. Second, we pre-train our transformer model using global and local views via a self-supervised manner, aiming to make it robust to spatial-temporal variations and discriminative across different scenes. To develop the foundation model, we construct a large-scale endoscopy video dataset by combining 9 publicly available datasets and a privately collected dataset from Baoshan Branch of Renji Hospital in Shanghai, China. Our dataset overall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

med-air/endo-fm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsColorectal Cancer Screening and Detection