Simulation-Driven Training of Vision Transformers Enabling Metal   Segmentation in X-Ray Images

Fuxin Fan; Ludwig Ritschl; Marcel Beister; Ramyar Biniazan; Bj\"orn; Kreher; Tristan M. Gottschalk; Steffen Kappler; Andreas Maier

arXiv:2203.09207·eess.IV·March 18, 2022

Simulation-Driven Training of Vision Transformers Enabling Metal Segmentation in X-Ray Images

Fuxin Fan, Ludwig Ritschl, Marcel Beister, Ramyar Biniazan, Bj\"orn, Kreher, Tristan M. Gottschalk, Steffen Kappler, Andreas Maier

PDF

Open Access

TL;DR

This paper introduces a simulation-based training method for vision transformers to improve metal segmentation in X-ray images, demonstrating superior performance over CNNs and highlighting the potential of CAD-generated data for clinical applications.

Contribution

The study presents a novel simulation-driven training approach using CAD models and vision transformers for metal segmentation in X-ray images, outperforming traditional CNNs.

Findings

01

Vision transformer with dual decoder achieves 0.90 dice score.

02

CAD-based data generation enhances training flexibility and generalization.

03

Transformer-based model outperforms CNN encoder-based networks.

Abstract

In several image acquisition and processing steps of X-ray radiography, knowledge of the existence of metal implants and their exact position is highly beneficial (e.g. dose regulation, image contrast adjustment). Another application which would benefit from an accurate metal segmentation is cone beam computed tomography (CBCT) which is based on 2D X-ray projections. Due to the high attenuation of metals, severe artifacts occur in the 3D X-ray acquisitions. The metal segmentation in CBCT projections usually serves as a prerequisite for metal artifact avoidance and reduction algorithms. Since the generation of high quality clinical training is a constant challenge, this study proposes to generate simulated X-ray images based on CT data sets combined with self-designed computer aided design (CAD) implants and make use of convolutional neural network (CNN) and vision transformer (ViT) for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced X-ray and CT Imaging · Advanced X-ray Imaging Techniques · Mineral Processing and Grinding

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Max Pooling · Convolution · Concatenated Skip Connection · Layer Normalization · Softmax · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia?