OperA: Attention-Regularized Transformers for Surgical Phase Recognition
Tobias Czempiel, Magdalini Paschali, Daniel Ostler, Seong Tae Kim,, Benjamin Busam, Nassir Navab

TL;DR
OperA is a transformer-based model with attention regularization that improves surgical phase recognition from videos and identifies key frames for summarization, outperforming existing methods.
Contribution
Introduces OperA, a novel attention-regularized transformer model for accurate surgical phase recognition and key frame identification in surgical videos.
Findings
OperA outperforms state-of-the-art methods on laparoscopic cholecystectomy datasets.
Attention regularization improves focus on high-quality frames.
High attention frames effectively characterize surgical phases.
Abstract
In this paper we introduce OperA, a transformer-based model that accurately predicts surgical phases from long video sequences. A novel attention regularization loss encourages the model to focus on high-quality frames during training. Moreover, the attention weights are utilized to identify characteristic high attention frames for each surgical phase, which could further be used for surgery summarization. OperA is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos, outperforming various state-of-the-art temporal refinement approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
