Surgformer: Surgical Transformer with Hierarchical Temporal Attention for Surgical Phase Recognition
Shu Yang, Luyang Luo, Qiong Wang, Hao Chen

TL;DR
Surgformer introduces a hierarchical temporal attention mechanism with divided spatial-temporal attention and sparse frame input to improve surgical phase recognition by capturing both global and local temporal dependencies.
Contribution
The paper proposes Surgformer, a novel end-to-end model with hierarchical temporal attention and divided spatial-temporal attention for better modeling of dependencies and reducing redundancy.
Findings
Outperforms state-of-the-art methods on benchmark datasets.
Effectively captures long-term and local temporal information.
Reduces spatial-temporal redundancy in surgical phase recognition.
Abstract
Existing state-of-the-art methods for surgical phase recognition either rely on the extraction of spatial-temporal features at a short-range temporal resolution or adopt the sequential extraction of the spatial and temporal features across the entire temporal resolution. However, these methods have limitations in modeling spatial-temporal dependency and addressing spatial-temporal redundancy: 1) These methods fail to effectively model spatial-temporal dependency, due to the lack of long-range information or joint spatial-temporal modeling. 2) These methods utilize dense spatial features across the entire temporal resolution, resulting in significant spatial-temporal redundancy. In this paper, we propose the Surgical Transformer (Surgformer) to address the issues of spatial-temporal modeling and redundancy in an end-to-end manner, which employs divided spatial-temporal attention and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced X-ray Imaging Techniques · Surgical Simulation and Training
MethodsSparse Evolutionary Training · Linear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings
