Symmetric Dilated Convolution for Surgical Gesture Recognition
Jinglu Zhang, Yinyu Nie, Yao Lyu, Hailin Li, Jian Chang, Xiaosong, Yang, Jian Jun Zhang

TL;DR
This paper introduces a novel symmetric dilated convolutional neural network with self-attention for surgical gesture recognition from RGB videos, effectively capturing long-term temporal dependencies without additional sensors.
Contribution
The proposed architecture uniquely combines symmetric dilation and self-attention to improve long-term temporal modeling in surgical video analysis.
Findings
Outperforms state-of-the-art methods by ~6 points in frame-wise accuracy
Achieves ~6 points higher F1@50 score
Effectively captures long-term frame dependencies
Abstract
Automatic surgical gesture recognition is a prerequisite of intra-operative computer assistance and objective surgical skill assessment. Prior works either require additional sensors to collect kinematics data or have limitations on capturing temporal information from long and untrimmed surgical videos. To tackle these challenges, we propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures with corresponding boundaries only using RGB videos. We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly. We validate the effectiveness of our approach on a fundamental robotic suturing task from the JIGSAWS dataset. The experiment results demonstrate the ability of our method on capturing long-term frame…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Augmented Reality Applications · Human Pose and Action Recognition
