Human Gaze Guided Attention for Surgical Activity Recognition
Abdishakour Awale, Duygu Sarikaya

TL;DR
This paper introduces a novel approach that leverages human gaze data to guide spatio-temporal attention in surgical activity recognition, significantly improving accuracy on a public dataset.
Contribution
It is the first to incorporate human gaze as supervision for attention in surgical video activity recognition, enhancing model performance.
Findings
Achieved 85.4% accuracy on JIGSAWS Suturing task.
Demonstrated the effectiveness of gaze-guided attention over state-of-the-art models.
Validated through ablation studies the importance of gaze supervision.
Abstract
Modeling and automatically recognizing surgical activities are fundamental steps toward automation in surgery and play important roles in providing timely feedback to surgeons. Accurately recognizing surgical activities in video poses a challenging problem that requires an effective means of learning both spatial and temporal dynamics. Human gaze and visual saliency carry important information about visual attention and can be used to extract more relevant features that better reflect these spatial and temporal dynamics. In this study, we propose to use human gaze with a spatio-temporal attention mechanism for activity recognition in surgical videos. Our model consists of an I3D-based architecture, learns spatio-temporal features using 3D convolutions, as well as learns an attention map using human gaze as supervision. We evaluate our model on the Suturing task of JIGSAWS which is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Delphi Technique in Research · Augmented Reality Applications
