Towards Comprehensive Real-Time Scene Understanding in Ophthalmic Surgery through Multimodal Image Fusion
Nikolo Rohrmoser, Ghazal Ghazaei, Michael Sommersperger, Nassir Navab

TL;DR
This paper presents a real-time multimodal image fusion network for ophthalmic surgery that combines microscope and intraoperative OCT imaging to improve instrument tracking and tool-tissue distance estimation.
Contribution
It introduces a novel temporal, multimodal neural network with a cross-attention fusion module for enhanced surgical scene understanding in vitreoretinal surgery.
Findings
Achieved 95.79% mAP50 in instrument detection
Significantly improved tool-tissue distance estimation accuracy from 284 μm to 33 μm
Operates at 22.5 ms per frame for real-time performance
Abstract
Purpose: The integration of multimodal imaging into operating rooms paves the way for comprehensive surgical scene understanding. In ophthalmic surgery, by now, two complementary imaging modalities are available: operating microscope (OPMI) imaging and real-time intraoperative optical coherence tomography (iOCT). This first work toward temporal OPMI and iOCT feature fusion demonstrates the potential of multimodal image processing for multi-head prediction through the example of precise instrument tracking in vitreoretinal surgery. Methods: We propose a multimodal, temporal, real-time capable network architecture to perform joint instrument detection, keypoint localization, and tool-tissue distance estimation. Our network design integrates a cross-attention fusion module to merge OPMI and iOCT image features, which are efficiently extracted via a YoloNAS and a CNN encoder,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal and Macular Surgery · Retinal Imaging and Analysis · Soft Robotics and Applications
