Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery
Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren

TL;DR
Surgical-DINO introduces a low-rank adaptation method for foundation models, specifically DINOv2, to improve depth estimation in endoscopic surgery by integrating surgical domain knowledge without full fine-tuning.
Contribution
This work presents a novel low-rank adaptation approach for foundation models, enabling effective surgical depth estimation without extensive fine-tuning.
Findings
Surgical-DINO outperforms state-of-the-art models in endoscopic depth estimation.
LoRA layers significantly enhance the adaptation to surgical domain.
The method demonstrates the importance of domain-specific adaptation over naive fine-tuning.
Abstract
Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. Methods: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Medical Image Segmentation Techniques · Advanced Image and Video Retrieval Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Linear Layer · Softmax · Residual Connection · Dense Connections · Vision Transformer · self-DIstillation with NO labels
