Self-Supervised Learning for Interventional Image Analytics: Towards   Robust Device Trackers

Saahil Islam; Venkatesh N. Murthy; Dominik Neumann; Badhan Kumar Das,; Puneet Sharma; Andreas Maier; Dorin Comaniciu; Florin C. Ghesu

arXiv:2405.01156·cs.CV·May 3, 2024

Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers

Saahil Islam, Venkatesh N. Murthy, Dominik Neumann, Badhan Kumar Das,, Puneet Sharma, Andreas Maier, Dorin Comaniciu, Florin C. Ghesu

PDF

Open Access

TL;DR

This paper introduces a self-supervised learning method for robust device tracking in interventional X-ray images, leveraging large-scale data and masked image modeling to improve accuracy and speed in challenging conditions.

Contribution

A novel self-supervised approach using masked image modeling and frame interpolation to learn spatio-temporal features from over 16 million X-ray frames for device tracking.

Findings

01

66.31% reduction in maximum tracking error

02

97.95% success score at 3x faster inference

03

outperforms state-of-the-art methods in robustness

Abstract

An accurate detection and tracking of devices such as guiding catheters in live X-ray image acquisitions is an essential prerequisite for endovascular cardiac interventions. This information is leveraged for procedural guidance, e.g., directing stent placements. To ensure procedural safety and efficacy, there is a need for high robustness no failures during tracking. To achieve that, one needs to efficiently tackle challenges, such as: device obscuration by contrast agent or other external devices or wires, changes in field-of-view or acquisition angle, as well as the continuous movement due to cardiac and respiratory motion. To overcome the aforementioned challenges, we propose a novel approach to learn spatio-temporal features from a very large data cohort of over 16 million interventional X-ray frames using self-supervision for image sequence data. Our approach is based on a masked…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image Processing Techniques and Applications · Machine Learning and Data Classification

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings