Affine Medical Image Registration with Coarse-to-Fine Vision Transformer
Tony C. W. Mok, Albert C. S. Chung

TL;DR
This paper introduces C2FViT, a fast and robust vision transformer-based algorithm for 3D affine medical image registration, outperforming CNN-based methods in accuracy, robustness, and generalizability.
Contribution
The paper proposes a novel vision transformer-based approach for 3D affine registration that improves upon CNN methods in accuracy and robustness.
Findings
Outperforms CNN-based methods in registration accuracy
Demonstrates superior robustness and generalizability
Maintains fast runtime performance
Abstract
Affine registration is indispensable in a comprehensive medical image registration pipeline. However, only a few studies focus on fast and robust affine registration algorithms. Most of these studies utilize convolutional neural networks (CNNs) to learn joint affine and non-parametric registration, while the standalone performance of the affine subnetwork is less explored. Moreover, existing CNN-based affine registration approaches focus either on the local misalignment or the global orientation and position of the input to predict the affine transformation matrix, which are sensitive to spatial initialization and exhibit limited generalizability apart from the training dataset. In this paper, we present a fast and robust learning-based algorithm, Coarse-to-Fine Vision Transformer (C2FViT), for 3D affine medical image registration. Our method naturally leverages the global connectivity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced MRI Techniques and Applications · Advanced Neural Network Applications
MethodsAttention Is All You Need · Linear Layer · Softmax · Dropout · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Label Smoothing · Multi-Head Attention · Absolute Position Encodings
