A Multi-View Pipeline and Benchmark Dataset for 3D Hand Pose Estimation in Surgery
Valery Fischer, Alan Magdaleno, Anna-Katharina Calek, Nicola Cavalcanti, Nathan Hoffman, Christoph Germann, Joschua W\"uthrich, Max Kr\"ahenmann, Mazda Farshad, Philipp F\"urnstahl, Lilian Calvet

TL;DR
This paper introduces a multi-view pipeline that accurately estimates 3D hand poses in surgical environments without domain-specific fine-tuning, supported by a new large annotated dataset for benchmarking.
Contribution
It presents a domain-agnostic, multi-view 3D hand pose estimation pipeline and introduces a comprehensive surgical benchmark dataset with over 68,000 frames and 3,000 annotated hand poses.
Findings
Achieved 31% reduction in 2D joint error
Achieved 76% reduction in 3D joint position error
Outperformed baseline methods in surgical hand pose estimation
Abstract
Purpose: Accurate 3D hand pose estimation supports surgical applications such as skill assessment, robot-assisted interventions, and geometry-aware workflow analysis. However, surgical environments pose severe challenges, including intense and localized lighting, frequent occlusions by instruments or staff, and uniform hand appearance due to gloves, combined with a scarcity of annotated datasets for reliable model training. Method: We propose a robust multi-view pipeline for 3D hand pose estimation in surgical contexts that requires no domain-specific fine-tuning and relies solely on off-the-shelf pretrained models. The pipeline integrates reliable person detection, whole-body pose estimation, and state-of-the-art 2D hand keypoint prediction on tracked hand crops, followed by a constrained 3D optimization. In addition, we introduce a novel surgical benchmark dataset comprising over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Human Pose and Action Recognition · Robot Manipulation and Learning
