EndoDINO: A Foundation Model for GI Endoscopy
Patrick Dermyer, Angad Kalra, Matt Schwartz

TL;DR
EndoDINO is a large-scale foundation model for gastrointestinal endoscopy that demonstrates strong generalization and state-of-the-art performance across multiple clinical tasks by pre-training on extensive curated image datasets.
Contribution
This work introduces EndoDINO, a foundation model pre-trained on the largest GI endoscopy image dataset, achieving high performance with simple decoder heads.
Findings
State-of-the-art performance in anatomical landmark classification
Superior results in polyp segmentation
Effective Mayo endoscopic scoring with minimal additional training
Abstract
In this work, we present EndoDINO, a foundation model for GI endoscopy tasks that achieves strong generalizability by pre-training on a well-curated image dataset sampled from the largest known GI endoscopy video dataset in the literature. Specifically, we pre-trained ViT models with 1B, 307M, and 86M parameters using datasets ranging from 100K to 10M curated images. Using EndoDINO as a frozen feature encoder, we achieved state-of-the-art performance in anatomical landmark classification, polyp segmentation, and Mayo endoscopic scoring (MES) for ulcerative colitis with only simple decoder heads.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection
