Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features
Constantinos Loukas

TL;DR
This paper presents a method for recognizing surgical phases in short video shots by combining deep CNN features, visual saliency, elapsed time, and LSTM networks, achieving high accuracy without using instrument or phase label information.
Contribution
It introduces the novel use of elapsed time as a feature and demonstrates the effectiveness of temporal modeling with LSTM for surgical phase recognition in short video shots.
Findings
Inclusion of elapsed time improves accuracy from 69% to 75%.
Fusion of CNN features with elapsed time increases accuracy to 86%.
Visual saliency and temporal modeling are crucial for performance.
Abstract
Recognizing the phases of a laparoscopic surgery (LS) operation form its video constitutes a fundamental step for efficient content representation, indexing and retrieval in surgical video databases. In the literature, most techniques focus on phase segmentation of the entire LS video using hand-crafted visual features, instrument usage signals, and recently convolutional neural networks (CNNs). In this paper we address the problem of phase recognition of short video shots (10s) of the operation, without utilizing information about the preceding/forthcoming video frames, their phase labels or the instruments used. We investigate four state-of-the-art CNN architectures (Alexnet, VGG19, GoogleNet, and ResNet101), for feature extraction via transfer learning. Visual saliency was employed for selecting the most informative region of the image as input to the CNN. Video shot representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
