SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision
Utsav Rai, Haozheng Xu, Stamatia Giannarou

TL;DR
This paper introduces a novel zero-shot 6 DoF pose estimation pipeline for surgical instruments in robot-assisted minimally invasive surgery, combining stereo vision and advanced deep learning models to improve generalisability and robustness.
Contribution
It advances zero-shot pose estimation in RMIS by integrating stereo vision with FoundationPose and SAM-6D, and improves segmentation accuracy with a fine-tuned Mask R-CNN.
Findings
Enhanced SAM-6D outperforms FoundationPose in zero-shot estimation
Depth estimation with RAFT-Stereo improves robustness in reflective environments
The pipeline sets a new benchmark for zero-shot pose estimation in RMIS
Abstract
Accurate pose estimation of surgical tools in Robot-assisted Minimally Invasive Surgery (RMIS) is essential for surgical navigation and robot control. While traditional marker-based methods offer accuracy, they face challenges with occlusions, reflections, and tool-specific designs. Similarly, supervised learning methods require extensive training on annotated datasets, limiting their adaptability to new tools. Despite their success in other domains, zero-shot pose estimation models remain unexplored in RMIS for pose estimation of surgical instruments, creating a gap in generalising to unseen surgical tools. This paper presents a novel 6 Degrees of Freedom (DoF) pose estimation pipeline for surgical instruments, leveraging state-of-the-art zero-shot RGB-D models like the FoundationPose and SAM-6D. We advanced these models by incorporating vision-based depth estimation using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoft Robotics and Applications · Surgical Simulation and Training · Robotics and Sensor-Based Localization
MethodsSoftmax · Convolution · RoIAlign · Region Proposal Network · Mask R-CNN
