StepAL: Step-aware Active Learning for Cataract Surgical Videos

Nisarg A. Shah; Bardia Safaei; Shameema Sikder; S. Swaroop Vedula; and Vishal M. Patel

arXiv:2507.22059·cs.CV·July 30, 2025

StepAL: Step-aware Active Learning for Cataract Surgical Videos

Nisarg A. Shah, Bardia Safaei, Shameema Sikder, S. Swaroop Vedula, and Vishal M. Patel

PDF

TL;DR

StepAL is a novel active learning framework tailored for surgical videos that selects entire videos for annotation by leveraging step-aware features and uncertainty, significantly reducing labeling effort while maintaining high recognition accuracy.

Contribution

We introduce StepAL, a step-aware active learning method that effectively selects full surgical videos for annotation, improving efficiency over existing frame-based approaches.

Findings

01

Outperforms existing active learning methods on cataract datasets

02

Achieves higher step recognition accuracy with fewer labeled videos

03

Reduces annotation effort in surgical video analysis

Abstract

Active learning (AL) can reduce annotation costs in surgical video analysis while maintaining model performance. However, traditional AL methods, developed for images or short video clips, are suboptimal for surgical step recognition due to inter-step dependencies within long, untrimmed surgical videos. These methods typically select individual frames or clips for labeling, which is ineffective for surgical videos where annotators require the context of the entire video for annotation. To address this, we propose StepAL, an active learning framework designed for full video selection in surgical step recognition. StepAL integrates a step-aware feature representation, which leverages pseudo-labels to capture the distribution of predicted steps within each video, with an entropy-weighted clustering strategy. This combination prioritizes videos that are both uncertain and exhibit diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.