TL;DR
This paper introduces a novel CNN-based method for real-time recognition and pose estimation of surgical instruments in minimally invasive surgery, improving detection accuracy and efficiency across different surgical applications.
Contribution
A new scene model and CNN architecture enable simultaneous instrument recognition and pose estimation without scale-dependent sliding windows, trained end-to-end.
Findings
Surpasses state-of-the-art in retinal microsurgery images
Effective in ex-vivo laparoscopic sequences
Achieves accurate detection and tracking
Abstract
Detection of surgical instruments plays a key role in ensuring patient safety in minimally invasive surgery. In this paper, we present a novel method for 2D vision-based recognition and pose estimation of surgical instruments that generalizes to different surgical applications. At its core, we propose a novel scene model in order to simultaneously recognize multiple instruments as well as their parts. We use a Convolutional Neural Network architecture to embody our model and show that the cross-entropy loss is well suited to optimize its parameters which can be trained in an end-to-end fashion. An additional advantage of our approach is that instrument detection at test time is achieved while avoiding the need for scale-dependent sliding window evaluation. This allows our approach to be relatively parameter free at test time and shows good performance for both instrument detection and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
