Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization
Sebastian Doerrich, Francesco Di Salvo, Christian Ledig

TL;DR
This paper introduces a self-supervised Vision Transformer-based generative approach to improve domain generalization in digital histopathology, significantly outperforming existing methods and scalable with more data and complex models.
Contribution
We propose a novel self-supervised Vision Transformer method that generates synthetic images to enhance domain generalization in histopathology, surpassing state-of-the-art performance.
Findings
Outperforms state-of-the-art on Camelyon17-wilds (+2%)
Achieves +26% improvement on epithelium-stroma dataset
Scales effectively with more unlabeled data and complex architectures
Abstract
Despite notable advancements, the integration of deep learning (DL) techniques into impactful clinical applications, particularly in the realm of digital histopathology, has been hindered by challenges associated with achieving robust generalization across diverse imaging domains and characteristics. Traditional mitigation strategies in this field such as data augmentation and stain color normalization have proven insufficient in addressing this limitation, necessitating the exploration of alternative methodologies. To this end, we propose a novel generative method for domain generalization in histopathology images. Our method employs a generative, self-supervised Vision Transformer to dynamically extract characteristics of image patches and seamlessly infuse them into the original images, thereby creating novel, synthetic images with diverse attributes. By enriching the dataset with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Cell Image Analysis Techniques · Advanced Vision and Imaging
MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Adam · Linear Layer · Dense Connections · Absolute Position Encodings
