An Empirical Study on Activity Recognition in Long Surgical Videos
Zhuohong He, Ali Mottaghi, Aidean Sharghi, Muhammad Abdullah Jamal,, Omid Mohareri

TL;DR
This study benchmarks various deep learning architectures for surgical activity recognition in long videos, demonstrating the effectiveness of Swin-Transformer+BiGRU and exploring domain adaptation techniques.
Contribution
It provides a comprehensive evaluation of backbones and temporal models on large-scale and public surgical video datasets, highlighting the best-performing architecture and domain adaptation methods.
Findings
Swin-Transformer+BiGRU achieved strong performance across datasets.
Benchmarking revealed the most effective architectures for surgical activity recognition.
Fine-tuning and unsupervised domain adaptation improved model transferability.
Abstract
Activity recognition in surgical videos is a key research area for developing next-generation devices and workflow monitoring systems. Since surgeries are long processes with highly-variable lengths, deep learning models used for surgical videos often consist of a two-stage setup using a backbone and temporal sequence model. In this paper, we investigate many state-of-the-art backbones and temporal models to find architectures that yield the strongest performance for surgical activity recognition. We first benchmark the models performance on a large-scale activity recognition dataset containing over 800 surgery videos captured in multiple clinical operating rooms. We further evaluate the models on the two smaller public datasets, the Cholec80 and Cataract-101 datasets, containing only 80 and 101 videos respectively. We empirically found that Swin-Transformer+BiGRU temporal model yielded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Context-Aware Activity Recognition Systems · Healthcare Technology and Patient Monitoring
