Transformer-Based Local Feature Matching for Multimodal Image Registration
Remi Delaunay, Ruisi Zhang, Filipe C. Pedrosa, Navid Feizi, Dianne, Sacco, Rajni Patel, and Jayender Jagadeesan

TL;DR
This paper introduces a Transformer-based dense feature matching approach for accurate 2D ultrasound to 3D CT registration, enhancing intraoperative guidance in ultrasound-guided surgeries.
Contribution
It presents a novel multimodal registration method using LoFTR with a differentiable pose estimation, tailored for ultrasound and CT image alignment.
Findings
Effective dense correspondence prediction between ultrasound and CT images.
Promising intraoperative ultrasound pose estimation results on ex vivo data.
Improved surgical guidance through accurate multimodal image registration.
Abstract
Ultrasound imaging is a cost-effective and radiation-free modality for visualizing anatomical structures in real-time, making it ideal for guiding surgical interventions. However, its limited field-of-view, speckle noise, and imaging artifacts make it difficult to interpret the images for inexperienced users. In this paper, we propose a new 2D ultrasound to 3D CT registration method to improve surgical guidance during ultrasound-guided interventions. Our approach adopts a dense feature matching method called LoFTR to our multimodal registration problem. We learn to predict dense coarse-to-fine correspondences using a Transformer-based architecture to estimate a robust rigid transformation between a 2D ultrasound frame and a CT scan. Additionally, a fully differentiable pose estimation method is introduced, optimizing LoFTR on pose estimation error during training. Experiments conducted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
