ReSW-VL: Representation Learning for Surgical Workflow Analysis Using Vision-Language Model

Satoshi Kondo

arXiv:2505.13746·cs.CV·May 21, 2025

ReSW-VL: Representation Learning for Surgical Workflow Analysis Using Vision-Language Model

Satoshi Kondo

PDF

Open Access

TL;DR

This paper introduces ReSW-VL, a novel approach using a vision-language model with prompt learning to improve surgical workflow analysis and phase recognition accuracy.

Contribution

It proposes fine-tuning a CLIP-based vision-language model with prompt learning specifically for surgical phase recognition tasks.

Findings

01

Outperforms conventional methods on three datasets

02

Effective use of prompt learning for surgical phase recognition

03

Demonstrates the potential of vision-language models in surgical workflow analysis

Abstract

Surgical phase recognition from video is a technology that automatically classifies the progress of a surgical procedure and has a wide range of potential applications, including real-time surgical support, optimization of medical resources, training and skill assessment, and safety improvement. Recent advances in surgical phase recognition technology have focused primarily on Transform-based methods, although methods that extract spatial features from individual frames using a CNN and video features from the resulting time series of spatial features using time series modeling have shown high performance. However, there remains a paucity of research on training methods for CNNs employed for feature extraction or representation learning in surgical phase recognition. In this study, we propose a method for representation learning in surgical workflow analysis using a vision-language model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Medical Imaging and Analysis

MethodsContrastive Language-Image Pre-training