Deep Homography Estimation in Dynamic Surgical Scenes for Laparoscopic Camera Motion Extraction
Martin Huber, S\'ebastien Ourselin, Christos Bergeles, Tom Vercauteren

TL;DR
This paper introduces a deep learning method for estimating laparoscopic camera motion in dynamic surgical scenes, using synthetic data augmentation and outperforming classical methods in accuracy and efficiency.
Contribution
It presents a novel approach for extracting camera motion from laparoscopic videos using synthetic homography augmentation and deep neural networks, addressing challenges of dynamic surgical environments.
Findings
Outperforms classical homography estimation in precision by 41%.
Reduces runtime on CPU by 43%.
Successfully transfers from synthetic to real surgical videos.
Abstract
Current laparoscopic camera motion automation relies on rule-based approaches or only focuses on surgical tools. Imitation Learning (IL) methods could alleviate these shortcomings, but have so far been applied to oversimplified setups. Instead of extracting actions from oversimplified setups, in this work we introduce a method that allows to extract a laparoscope holder's actions from videos of laparoscopic interventions. We synthetically add camera motion to a newly acquired dataset of camera motion free da Vinci surgery image sequences through a novel homography generation algorithm. The synthetic camera motion serves as a supervisory signal for camera motion estimation that is invariant to object and tool motion. We perform an extensive evaluation of state-of-the-art (SOTA) Deep Neural Networks (DNNs) across multiple compute regimes, finding our method transfers from our camera…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Advanced Vision and Imaging · Multimodal Machine Learning Applications
