Self-supervised Vision Transformer are Scalable Generative Models for   Domain Generalization

Sebastian Doerrich; Francesco Di Salvo; Christian Ledig

arXiv:2407.02900·eess.IV·July 4, 2024

Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization

Sebastian Doerrich, Francesco Di Salvo, Christian Ledig

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised Vision Transformer-based generative approach to improve domain generalization in digital histopathology, significantly outperforming existing methods and scalable with more data and complex models.

Contribution

We propose a novel self-supervised Vision Transformer method that generates synthetic images to enhance domain generalization in histopathology, surpassing state-of-the-art performance.

Findings

01

Outperforms state-of-the-art on Camelyon17-wilds (+2%)

02

Achieves +26% improvement on epithelium-stroma dataset

03

Scales effectively with more unlabeled data and complex architectures

Abstract

Despite notable advancements, the integration of deep learning (DL) techniques into impactful clinical applications, particularly in the realm of digital histopathology, has been hindered by challenges associated with achieving robust generalization across diverse imaging domains and characteristics. Traditional mitigation strategies in this field such as data augmentation and stain color normalization have proven insufficient in addressing this limitation, necessitating the exploration of alternative methodologies. To this end, we propose a novel generative method for domain generalization in histopathology images. Our method employs a generative, self-supervised Vision Transformer to dynamically extract characteristics of image patches and seamlessly infuse them into the original images, thereby creating novel, synthetic images with diverse attributes. By enriching the dataset with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sdoerrich97/vits-are-generative-models
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Cell Image Analysis Techniques · Advanced Vision and Imaging

MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Adam · Linear Layer · Dense Connections · Absolute Position Encodings