Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization
Sebastian Doerrich, Francesco Di Salvo, Jonas Alle, Christian Ledig

TL;DR
Stylizing ViT introduces a novel Vision Transformer architecture that enhances domain generalization in medical image analysis by maintaining anatomical consistency during style transfer, leading to improved robustness and perceptually convincing augmentations.
Contribution
It proposes a new ViT-based model with weight-shared attention for anatomy-preserving style transfer, improving domain generalization in medical imaging tasks.
Findings
Up to +13% accuracy improvement over state-of-the-art methods.
Generates artifact-free, perceptually convincing images.
Achieves 17% performance boost during inference with test-time augmentation.
Abstract
Deep learning models in medical image analysis often struggle with generalizability across domains and demographic groups due to data heterogeneity and scarcity. Traditional augmentation improves robustness, but fails under substantial domain shifts. Recent advances in stylistic augmentation enhance domain generalization by varying image styles but fall short in terms of style diversity or by introducing artifacts into the generated images. To address these limitations, we propose Stylizing ViT, a novel Vision Transformer encoder that utilizes weight-shared attention blocks for both self- and cross-attention. This design allows the same attention block to maintain anatomical consistency through self-attention while performing style transfer via cross-attention. We assess the effectiveness of our method for domain generalization by employing it for data augmentation on three distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗sdoerrich97/stylizing_vit_base_camelyon17wildsmodel· 1 dl1 dl
- 🤗sdoerrich97/stylizing_vit_small_camelyon17wildsmodel
- 🤗sdoerrich97/stylizing_vit_tiny_camelyon17wildsmodel
- 🤗sdoerrich97/stylizing_vit_tiny_cholec80model
- 🤗sdoerrich97/stylizing_vit_small_cholec80model
- 🤗sdoerrich97/stylizing_vit_base_cholec80model
- 🤗sdoerrich97/stylizing_vit_base_ddi_12_34_56model
- 🤗sdoerrich97/stylizing_vit_small_ddi_12_34_56model
- 🤗sdoerrich97/stylizing_vit_tiny_ddi_12_34_56model
- 🤗sdoerrich97/stylizing_vit_tiny_ddi_65_43_21model
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · AI in cancer detection
