# Weakly supervised pre-training for surgical step recognition using unannotated and heterogeneously labeled videos

**Authors:** Sreeram Kamabattula, Kai Chen, Kiran Bhattacharyya

PMC · DOI: 10.1007/s11548-025-03555-2 · International Journal of Computer Assisted Radiology and Surgery · 2025-12-02

## TL;DR

This paper introduces a method to train surgical step recognition models using unannotated or poorly labeled videos, improving performance when labeled data is limited.

## Contribution

The novel contribution is a weakly supervised pre-training framework using unannotated and heterogeneously labeled surgical videos to enhance step recognition accuracy.

## Key findings

- Pre-training with surgical phase labels from the same procedure improved step recognition by up to 6.4 f1-score points.
- Label efficiency analysis showed weak pre-training outperformed requiring 30–60 additional labeled videos at low annotation levels.
- Cross-procedure step pre-training and time-based labels provided moderate performance gains depending on procedure structure.

## Abstract

Surgical video review is essential for minimally invasive surgical training, but manual annotation of surgical steps is time-consuming and limits scalability. We propose a weakly supervised pre-training framework that leverages unannotated or heterogeneously labeled surgical videos to improve automated surgical step recognition.

We evaluate three types of weak labels derived from unannotated datasets: (1) surgical phases from the same or other procedures, (2) surgical steps from different procedure types, and (3) intraoperative time progression. Using datasets from four robotic-assisted procedures (sleeve gastrectomy, hysterectomy, cholecystectomy, and radical prostatectomy), we simulate real-world annotation scarcity by varying the proportion of available step annotations (\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\alpha $$\end{document}α
\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\in $$\end{document}∈ 0.25, 0.5, 0.75, 1.0). We benchmark the performance of a 2D CNN model trained with and without weak label pre-training.

Pre-training with surgical phase labels—particularly from the same procedure type (Phase-Within)—consistently improved step recognition performance, with gains up to 6.4 f1-score points over standard ImageNet-based models under limited annotation conditions (\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\alpha $$\end{document}α = 0.25 on SLG). Cross-procedure step pre-training was beneficial for some procedures, and time-based labels provided moderate gains depending on procedure structure. Label efficiency analysis shows the baseline model would require labeling an additional 30–60 videos at \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\alpha $$\end{document}α = 0.25 to match the performance achieved by the best weak-pretraining strategy across procedures.

Weakly supervised pre-training offers a practical strategy to improve surgical step recognition when annotated data is scarce. This approach can support scalable feedback and assessment in surgical training workflows where comprehensive annotations are infeasible.

## Full-text entities

- **Diseases:** cholecystectomy (MESH:D017562)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13013371/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13013371/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC13013371/full.md

---
Source: https://tomesphere.com/paper/PMC13013371