TL;DR
This paper introduces DMNet, a dual-memory network that combines local and global spatio-temporal knowledge to improve real-time surgical instrument segmentation in videos, outperforming existing methods.
Contribution
The paper presents a novel dual-memory network that efficiently integrates local and global temporal information for improved real-time segmentation in surgical videos.
Findings
Outperforms state-of-the-art segmentation methods on benchmark datasets.
Maintains real-time processing speed while improving accuracy.
Effectively models long-range semantic correlations and local temporal dependencies.
Abstract
Performing a real-time and accurate instrument segmentation from videos is of great significance for improving the performance of robotic-assisted surgery. We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration. However, most existing works perform segmentation purely using visual cues in a single frame. Optical flow is just used to model the motion between only two frames and brings heavy computational cost. We propose a novel dual-memory network (DMNet) to wisely relate both global and local spatio-temporal knowledge to augment the current features, boosting the segmentation performance and retaining the real-time prediction capability. We propose, on the one hand, an efficient local memory by taking the complementary advantages of convolutional LSTM and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
