DACAT: Dual-stream Adaptive Clip-aware Time Modeling for Robust Online Surgical Phase Recognition
Kaixiang Yang, Qiang Li, Zhiwei Wang

TL;DR
DACAT introduces a dual-stream, clip-aware time modeling approach for online surgical phase recognition, significantly improving accuracy by adaptively leveraging historical context and current frame features.
Contribution
The paper proposes DACAT, a novel dual-stream model with adaptive clip-aware context encoding, enhancing temporal modeling for surgical phase recognition over existing methods.
Findings
Outperforms state-of-the-art methods on three datasets.
Achieves at least 2.7-4.6% higher Jaccard scores.
Demonstrates robust online surgical phase recognition.
Abstract
Surgical phase recognition has become a crucial requirement in laparoscopic surgery, enabling various clinical applications like surgical risk forecasting. Current methods typically identify the surgical phase using individual frame-wise embeddings as the fundamental unit for time modeling. However, this approach is overly sensitive to current observations, often resulting in discontinuous and erroneous predictions within a complete surgical phase. In this paper, we propose DACAT, a novel dual-stream model that adaptively learns clip-aware context information to enhance the temporal relationship. In one stream, DACAT pretrains a frame encoder, caching all historical frame-wise features. In the other stream, DACAT fine-tunes a new frame encoder to extract the frame-wise feature at the current moment. Additionally, a max clip-response read-out (Max-R) module is introduced to bridge the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods
MethodsContrastive Language-Image Pre-training
