Surgformer: Surgical Transformer with Hierarchical Temporal Attention   for Surgical Phase Recognition

Shu Yang; Luyang Luo; Qiong Wang; Hao Chen

arXiv:2408.03867·cs.CV·August 8, 2024·2 cites

Surgformer: Surgical Transformer with Hierarchical Temporal Attention for Surgical Phase Recognition

Shu Yang, Luyang Luo, Qiong Wang, Hao Chen

PDF

Open Access 1 Repo

TL;DR

Surgformer introduces a hierarchical temporal attention mechanism with divided spatial-temporal attention and sparse frame input to improve surgical phase recognition by capturing both global and local temporal dependencies.

Contribution

The paper proposes Surgformer, a novel end-to-end model with hierarchical temporal attention and divided spatial-temporal attention for better modeling of dependencies and reducing redundancy.

Findings

01

Outperforms state-of-the-art methods on benchmark datasets.

02

Effectively captures long-term and local temporal information.

03

Reduces spatial-temporal redundancy in surgical phase recognition.

Abstract

Existing state-of-the-art methods for surgical phase recognition either rely on the extraction of spatial-temporal features at a short-range temporal resolution or adopt the sequential extraction of the spatial and temporal features across the entire temporal resolution. However, these methods have limitations in modeling spatial-temporal dependency and addressing spatial-temporal redundancy: 1) These methods fail to effectively model spatial-temporal dependency, due to the lack of long-range information or joint spatial-temporal modeling. 2) These methods utilize dense spatial features across the entire temporal resolution, resulting in significant spatial-temporal redundancy. In this paper, we propose the Surgical Transformer (Surgformer) to address the issues of spatial-temporal modeling and redundancy in an end-to-end manner, which employs divided spatial-temporal attention and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isyangshu/surgformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Advanced X-ray Imaging Techniques · Surgical Simulation and Training

MethodsSparse Evolutionary Training · Linear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings