ViTALS: Vision Transformer for Action Localization in Surgical   Nephrectomy

Soumyadeep Chandra; Sayeed Shafayet Chowdhury; Courtney Yong; Chandru; P. Sundaram; Kaushik Roy

arXiv:2405.02571·cs.CV·May 7, 2024

ViTALS: Vision Transformer for Action Localization in Surgical Nephrectomy

Soumyadeep Chandra, Sayeed Shafayet Chowdhury, Courtney Yong, Chandru, P. Sundaram, Kaushik Roy

PDF

Open Access

TL;DR

This paper introduces ViTALS, a novel vision transformer model with hierarchical temporal convolutions for surgical action localization, validated on a new nephrectomy dataset and achieving state-of-the-art results.

Contribution

The paper presents a new dataset UroSlice and a novel ViTALS model that effectively captures temporal features for surgical action localization.

Findings

01

Achieved 89.8% accuracy on Cholec80 dataset.

02

Achieved 66.1% accuracy on UroSlice dataset.

03

Validated effectiveness of the proposed model.

Abstract

Surgical action localization is a challenging computer vision problem. While it has promising applications including automated training of surgery procedures, surgical workflow optimization, etc., appropriate model design is pivotal to accomplishing this task. Moreover, the lack of suitable medical datasets adds an additional layer of complexity. To that effect, we introduce a new complex dataset of nephrectomy surgeries called UroSlice. To perform the action localization from these videos, we propose a novel model termed as `ViTALS' (Vision Transformer for Action Localization in Surgical Nephrectomy). Our model incorporates hierarchical dilated temporal convolution layers and inter-layer residual connections to capture the temporal correlations at finer as well as coarser granularities. The proposed approach achieves state-of-the-art performance on Cholec80 and UroSlice datasets (89.8%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Advanced X-ray and CT Imaging · Anatomy and Medical Technology

MethodsAttention Is All You Need · Dense Connections · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer