TL;DR
SurgicalMamba is a novel online surgical phase recognition model that employs dual-path structured state-space components, adaptive time-warping, and state regramming to achieve state-of-the-art accuracy efficiently.
Contribution
It introduces a dual-path SSD architecture with phase-aligned state regramming and adaptive time-warping for improved online surgical workflow recognition.
Findings
Achieves state-of-the-art accuracy on seven benchmarks.
Runs at 238.74 fps on a single GPU.
Ablation studies confirm each component's effectiveness.
Abstract
Online surgical phase recognition (SPR) underpins context-aware operating-room systems and requires committing to a prediction at every frame from past context alone. Surgical video poses three demands that natural-video recognizers do not jointly address: procedures span tens of thousands of frames, time flows non-uniformly as long routine stretches are punctuated by brief phase-defining transitions, and the visual domain is narrow so backbone features are strongly correlated across channels. Existing recognizers either let per-frame cost grow with elapsed length, or hold cost bounded but advance state at a uniform rate with channel-independent dynamics, leaving the latter two demands unaddressed. We present SurgicalMamba, a causal SPR model built on Mamba2's structured state-space duality (SSD) that holds per-frame cost at O(d). It introduces three SSD-compatible components, each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
