Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery

Ming Hu; Zhengdi Yu; Feilong Tang; Kaiwen Chen; Yulong Li; Imran Razzak; Junjun He; Tolga Birdal; Kaijing Zhou; Zongyuan Ge

arXiv:2505.17677·cs.CV·June 2, 2025

Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery

Ming Hu, Zhengdi Yu, Feilong Tang, Kaiwen Chen, Yulong Li, Imran Razzak, Junjun He, Tolga Birdal, Kaijing Zhou, Zongyuan Ge

PDF

TL;DR

This paper introduces OphNet-3D, a large-scale RGB-D dataset for ophthalmic surgery, along with new benchmarks and models for hand and instrument 3D reconstruction, significantly advancing vision-based surgical analysis.

Contribution

The creation of OphNet-3D dataset, a multi-stage automatic annotation pipeline, and the development of H-Net and OH-Net architectures for improved hand-instrument interaction reconstruction.

Findings

01

Models outperform existing methods by over 2mm MPJPE

02

Achieve up to 23% improvement in ADD-S metrics

03

Establish new benchmarks for hand and instrument reconstruction

Abstract

Accurate 3D reconstruction of hands and instruments is critical for vision-based analysis of ophthalmic microsurgery, yet progress has been hampered by the lack of realistic, large-scale datasets and reliable annotation tools. In this work, we introduce OphNet-3D, the first extensive RGB-D dynamic 3D reconstruction dataset for ophthalmic surgery, comprising 41 sequences from 40 surgeons and totaling 7.1 million frames, with fine-grained annotations of 12 surgical phases, 10 instrument categories, dense MANO hand meshes, and full 6-DoF instrument poses. To scalably produce high-fidelity labels, we design a multi-stage automatic annotation pipeline that integrates multi-view data observation, data-driven motion prior with cross-view geometric consistency and biomechanical constraints, along with a combination of collision-aware interaction constraints for instrument interactions. Building…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.