GLSFormer: Gated - Long, Short Sequence Transformer for Step Recognition   in Surgical Videos

Nisarg A. Shah; Shameema Sikder; S. Swaroop Vedula; Vishal M. Patel

arXiv:2307.11081·cs.CV·July 22, 2023·2 cites

GLSFormer: Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos

Nisarg A. Shah, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel

PDF

Open Access 1 Repo

TL;DR

GLSFormer introduces a gated-transformer model that jointly captures short and long-range spatio-temporal features for improved surgical step recognition in videos, outperforming existing methods on cataract surgery datasets.

Contribution

The paper presents a novel vision transformer with gated-temporal attention for joint spatio-temporal modeling in surgical videos, addressing limitations of prior separate or short-range methods.

Findings

01

Outperforms state-of-the-art methods on Cataract-101 and D99 datasets.

02

Effectively combines short-term and long-term features.

03

Demonstrates robustness across different surgical video datasets.

Abstract

Automated surgical step recognition is an important task that can significantly improve patient safety and decision-making during surgeries. Existing state-of-the-art methods for surgical step recognition either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly. However, the benefits of joint modeling of spatio-temporal features and long-range information are not taken in account. In this paper, we propose a vision transformer-based approach to jointly learn spatio-temporal features directly from sequence of frame-level patches. Our method incorporates a gated-temporal attention mechanism that intelligently combines short-term and long-term spatio-temporal feature representations. We extensively evaluate our approach on two cataract surgery video datasets, namely Cataract-101 and D99, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nisargshah1999/glsformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Intraocular Surgery and Lenses · Digital Imaging in Medicine