Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization

Sebastian Doerrich; Francesco Di Salvo; Jonas Alle; Christian Ledig

arXiv:2601.17586·cs.CV·January 27, 2026

Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization

Sebastian Doerrich, Francesco Di Salvo, Jonas Alle, Christian Ledig

PDF

Open Access 10 Models

TL;DR

Stylizing ViT introduces a novel Vision Transformer architecture that enhances domain generalization in medical image analysis by maintaining anatomical consistency during style transfer, leading to improved robustness and perceptually convincing augmentations.

Contribution

It proposes a new ViT-based model with weight-shared attention for anatomy-preserving style transfer, improving domain generalization in medical imaging tasks.

Findings

01

Up to +13% accuracy improvement over state-of-the-art methods.

02

Generates artifact-free, perceptually convincing images.

03

Achieves 17% performance boost during inference with test-time augmentation.

Abstract

Deep learning models in medical image analysis often struggle with generalizability across domains and demographic groups due to data heterogeneity and scarcity. Traditional augmentation improves robustness, but fails under substantial domain shifts. Recent advances in stylistic augmentation enhance domain generalization by varying image styles but fall short in terms of style diversity or by introducing artifacts into the generated images. To address these limitations, we propose Stylizing ViT, a novel Vision Transformer encoder that utilizes weight-shared attention blocks for both self- and cross-attention. This design allows the same attention block to maintain anatomical consistency through self-attention while performing style transfer via cross-attention. We assess the effectiveness of our method for domain generalization by employing it for data augmentation on three distinct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · AI in cancer detection