Surgical-DINO: Adapter Learning of Foundation Models for Depth   Estimation in Endoscopic Surgery

Beilei Cui; Mobarakol Islam; Long Bai; Hongliang Ren

arXiv:2401.06013·cs.CV·January 15, 2024·1 cites

Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren

PDF

Open Access 1 Repo

TL;DR

Surgical-DINO introduces a low-rank adaptation method for foundation models, specifically DINOv2, to improve depth estimation in endoscopic surgery by integrating surgical domain knowledge without full fine-tuning.

Contribution

This work presents a novel low-rank adaptation approach for foundation models, enabling effective surgical depth estimation without extensive fine-tuning.

Findings

01

Surgical-DINO outperforms state-of-the-art models in endoscopic depth estimation.

02

LoRA layers significantly enhance the adaptation to surgical domain.

03

The method demonstrates the importance of domain-specific adaptation over naive fine-tuning.

Abstract

Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. Methods: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

beileicui/surgicaldino
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Medical Image Segmentation Techniques · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Linear Layer · Softmax · Residual Connection · Dense Connections · Vision Transformer · self-DIstillation with NO labels