EndoDINO: A Foundation Model for GI Endoscopy

Patrick Dermyer; Angad Kalra; Matt Schwartz

arXiv:2501.05488·cs.CV·March 21, 2025·2 cites

EndoDINO: A Foundation Model for GI Endoscopy

Patrick Dermyer, Angad Kalra, Matt Schwartz

PDF

Open Access

TL;DR

EndoDINO is a large-scale foundation model for gastrointestinal endoscopy that demonstrates strong generalization and state-of-the-art performance across multiple clinical tasks by pre-training on extensive curated image datasets.

Contribution

This work introduces EndoDINO, a foundation model pre-trained on the largest GI endoscopy image dataset, achieving high performance with simple decoder heads.

Findings

01

State-of-the-art performance in anatomical landmark classification

02

Superior results in polyp segmentation

03

Effective Mayo endoscopic scoring with minimal additional training

Abstract

In this work, we present EndoDINO, a foundation model for GI endoscopy tasks that achieves strong generalizability by pre-training on a well-curated image dataset sampled from the largest known GI endoscopy video dataset in the literature. Specifically, we pre-trained ViT models with 1B, 307M, and 86M parameters using datasets ranging from 100K to 10M curated images. Using EndoDINO as a frozen feature encoder, we achieved state-of-the-art performance in anatomical landmark classification, polyp segmentation, and Mayo endoscopic scoring (MES) for ulcerative colitis with only simple decoder heads.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsColorectal Cancer Screening and Detection