EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training

Qingyao Tian; Huai Liao; Xinyan Huang; Bingyu Yang; Dongdong Lei; Sebastien Ourselin; Hongbin Liu

arXiv:2502.19090·cs.CV·May 16, 2025

EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training

Qingyao Tian, Huai Liao, Xinyan Huang, Bingyu Yang, Dongdong Lei, Sebastien Ourselin, Hongbin Liu

PDF

Open Access 1 Repo

TL;DR

EndoMamba is a new foundation model for endoscopic videos that offers real-time inference and improved performance by combining hierarchical self-supervised pre-training with an efficient spatiotemporal backbone.

Contribution

The paper introduces EndoMamba, a novel efficient backbone optimized for real-time endoscopic video analysis, and a hierarchical self-supervised pre-training method that leverages both spatial-temporal reconstruction and general video knowledge.

Findings

01

Outperforms existing models on multiple endoscopic tasks

02

Achieves real-time inference speed in practical applications

03

Enhances representation learning through hierarchical pre-training

Abstract

Endoscopic video-based tasks, such as visual navigation and surgical phase recognition, play a crucial role in minimally invasive surgeries by providing real-time assistance. While recent video foundation models have shown promise, their applications are hindered by (1) computational inefficiencies and (2) suboptimal performance caused by limited data for pre-training in endoscopy. To address these issues, we present EndoMamba, a foundation model designed for real-time inference while learning generalized spatiotemporal representations. First, to mitigate computational inefficiencies, we propose the EndoMamba backbone, optimized for real-time inference. Inspired by recent advancements in state space models, EndoMamba integrates Bidirectional Mamba blocks for spatial modeling within individual frames and vanilla Mamba blocks for past-to-present reasoning across the temporal domain. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tiancuteqy/endomamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsColorectal Cancer Screening and Detection · Gastrointestinal Bleeding Diagnosis and Treatment

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces