Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers
Saahil Islam, Venkatesh N. Murthy, Dominik Neumann, Badhan Kumar Das,, Puneet Sharma, Andreas Maier, Dorin Comaniciu, Florin C. Ghesu

TL;DR
This paper introduces a self-supervised learning method for robust device tracking in interventional X-ray images, leveraging large-scale data and masked image modeling to improve accuracy and speed in challenging conditions.
Contribution
A novel self-supervised approach using masked image modeling and frame interpolation to learn spatio-temporal features from over 16 million X-ray frames for device tracking.
Findings
66.31% reduction in maximum tracking error
97.95% success score at 3x faster inference
outperforms state-of-the-art methods in robustness
Abstract
An accurate detection and tracking of devices such as guiding catheters in live X-ray image acquisitions is an essential prerequisite for endovascular cardiac interventions. This information is leveraged for procedural guidance, e.g., directing stent placements. To ensure procedural safety and efficacy, there is a need for high robustness no failures during tracking. To achieve that, one needs to efficiently tackle challenges, such as: device obscuration by contrast agent or other external devices or wires, changes in field-of-view or acquisition angle, as well as the continuous movement due to cardiac and respiratory motion. To overcome the aforementioned challenges, we propose a novel approach to learn spatio-temporal features from a very large data cohort of over 16 million interventional X-ray frames using self-supervision for image sequence data. Our approach is based on a masked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image Processing Techniques and Applications · Machine Learning and Data Classification
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
