TL;DR
ResViT introduces a novel generative adversarial model combining vision transformers and CNNs for improved multi-modal medical image synthesis, demonstrating superior results over existing methods.
Contribution
The paper proposes ResViT, a new GAN architecture with residual transformer blocks and a unified implementation for multi-modal medical image synthesis.
Findings
ResViT outperforms CNN- and transformer-based methods in quality and metrics.
ResViT effectively synthesizes missing MRI sequences and CT images from MRI.
The model reduces computational load with weight sharing among transformer blocks.
Abstract
Generative adversarial models with convolutional neural network (CNN) backbones have recently been established as state-of-the-art in numerous medical image synthesis tasks. However, CNNs are designed to perform local processing with compact filters, and this inductive bias compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, that leverages the contextual sensitivity of vision transformers along with the precision of convolution operators and realism of adversarial learning.} ResViT's generator employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine residual convolutional and transformer modules. Residual connections in ART blocks promote diversity in captured representations, while a channel compression module distills task-relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
