Event Recognition in Laparoscopic Gynecology Videos with Hybrid   Transformers

Sahar Nasirihaghighi; Negin Ghamsarian; Heinrich Husslein; Klaus; Schoeffmann

arXiv:2312.00593·cs.CV·December 4, 2023·1 cites

Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers

Sahar Nasirihaghighi, Negin Ghamsarian, Heinrich Husslein, Klaus, Schoeffmann

PDF

Open Access

TL;DR

This paper introduces a hybrid transformer-based approach for recognizing key events in laparoscopic gynecology videos, utilizing a new annotated dataset and a frame sampling strategy to improve accuracy amidst challenging surgical conditions.

Contribution

The paper presents a novel hybrid transformer architecture and a specialized training-inference framework for event recognition in laparoscopic videos, along with a new dataset and sampling strategy.

Findings

01

Hybrid transformer architecture outperforms CNN-RNN models in accuracy.

02

The dataset includes annotations for intra-operative challenges and complications.

03

Frame sampling strategy enhances temporal resolution and robustness.

Abstract

Analyzing laparoscopic surgery videos presents a complex and multifaceted challenge, with applications including surgical training, intra-operative surgical complication prediction, and post-operative surgical assessment. Identifying crucial events within these videos is a significant prerequisite in a majority of these applications. In this paper, we introduce a comprehensive dataset tailored for relevant event recognition in laparoscopic gynecology videos. Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications. To validate the precision of our annotations, we assess event recognition performance using several CNN-RNN architectures. Furthermore, we introduce and evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Radiology practices and education

MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer