Data-Efficient Surgical Phase Segmentation in Small-Incision Cataract Surgery: A Controlled Study of Vision Foundation Models

Lincoln Spencer; Song Wang; Chen Chen

arXiv:2604.10514·cs.CV·April 14, 2026

Data-Efficient Surgical Phase Segmentation in Small-Incision Cataract Surgery: A Controlled Study of Vision Foundation Models

Lincoln Spencer, Song Wang, Chen Chen

PDF

1 Repo

TL;DR

This study demonstrates that modern vision foundation models significantly enhance data-efficient surgical phase segmentation in small-incision cataract surgery, especially in low-label settings, by leveraging transferability and lightweight adaptation.

Contribution

It provides a controlled comparison showing foundation models outperform traditional encoders in surgical video segmentation and offers practical insights for low-label medical video applications.

Findings

01

DINOv3 ViT-7B achieves 83.4% accuracy in segmentation.

02

Foundation models improve performance over supervised encoders.

03

Lightweight adaptation benefits transfer learning in surgical videos.

Abstract

Surgical phase segmentation is central to computer-assisted surgery, yet robust models remain difficult to develop when labeled surgical videos are scarce. We study data-efficient phase segmentation for manual small-incision cataract surgery (SICS) through a controlled comparison of visual representations. To isolate representation quality, we pair each visual encoder with the same temporal model (MS-TCN++) under identical training and evaluation settings on SICS-155 (19 phases). We compare supervised encoders (ResNet-50, I3D) against large self-supervised foundation models (DINOv3, V-JEPA2), and use a cached-feature pipeline that decouples expensive visual encoding from lightweight temporal learning. Foundation-model features improve segmentation performance in this setup, with DINOv3 ViT-7B achieving the best overall results (83.4% accuracy, 87.0 edit score). We further examine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://sl2005.github.io/DataEfficient-sics-phase-seg
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.