# ViT-Stain: Vision transformer-driven virtual staining for skin histopathology via global contextual learning

**Authors:** Muhammad Altaf Hussain, Muhammad Asim Waris, Muhammad Usman Akram, Muhammad Jawad Khan, Muhammad Zeeshan Asaf, Amber Javaid, Syed Omer Gilani, Fawwaz Hazzazi

PMC · DOI: 10.1371/journal.pone.0341311 · PLOS One · 2026-02-02

## TL;DR

ViT-Stain uses vision transformers to create high-quality virtual H&E-stained skin tissue images, outperforming traditional methods in preserving detail and accuracy.

## Contribution

Introduces ViT-Stain, a vision transformer-based framework for virtual staining that captures global context and improves diagnostic fidelity.

## Key findings

- ViT-Stain outperforms CNN and GAN models in virtual staining metrics like SSIM, PSNR, and FID.
- The model achieves 85% diagnostic concordance with H&E-stained images, as evaluated by pathologists.
- A novel histology-specific fidelity index (HSFI) was developed and used for evaluation.

## Abstract

Current virtual staining approaches for histopathology slides use convolutional neural networks (CNNs) and generative adversarial networks (GANs). These approaches rely on local receptive fields, struggle to capture global context, and long-range tissue dependencies. This limitation can introduce artifacts in fine textures and cause loss of subtle morphological details. We propose a novel vision transformer-driven virtual staining framework (ViT-Stain) that translates unstained skin tissue images into hematoxylin and eosin (H&E)-equivalent images. The transformer’s self-attention enables ViT-Stain to capture long-range dependencies, preserve global context, and maintain fine textures. We trained ViT-Stain on the E-Staining DermaRepo dataset, which pairs unstained and H&E-stained whole-slide images (WSIs). We validated our model using metrics including SSIM, PSNR, FID, KID, LPIPS, and a novel histology-specific fidelity index (HSFI). Three board-certified pathologists provided feedback for qualitative evaluations. ViT-Stain outperforms leading CNN and GAN models, including Pix2Pix, CycleGAN, CUTGAN, and DCLGAN. It achieves an overall diagnostic concordance of 85% with virtual H&E-stains (Fleiss’ κ = 0.88). However, the model requires longer training (about 93 hours on A100 GPUs) and inference times (about 2.9 minutes). Our work advances AI-driven diagnostic reproducibility for high-fidelity clinical settings and aligns with the World Health Organization (WHO) global health goals.

## Full-text entities

- **Genes:** GAN (gigaxonin) [NCBI Gene 8139] {aka GAN1, GIG, KLHL16}, VIT (vitrin) [NCBI Gene 5212] {aka VIT1}, IGKV7-3 (immunoglobulin kappa variable 7-3 (pseudogene)) [NCBI Gene 28905] {aka B1, IGKV73}, IGKV5-2 (immunoglobulin kappa variable 5-2) [NCBI Gene 28907] {aka B2, IGKV52}
- **Diseases:** ViTs (MESH:D014786), SCC (MESH:D002294), melanoma (MESH:D008545), carcinomas (MESH:D009369), hallucinations (MESH:D006212), H&amp;E (MESH:D016751), KID (MESH:C535290), melanocytic lesions (MESH:D009508), IEC (MESH:D004814), DL (MESH:D007859), GANs (MESH:D004829), inflammatory dermatoses (MESH:D012871), HD (MESH:D006816), lesion (MESH:D009059), BCC (MESH:D002280)
- **Chemicals:** melanin (MESH:D008543), CycleGAN (-), H&amp;E (MESH:D006371), eosin (MESH:D004801), E (MESH:D004540), hematoxylin (MESH:D006416)
- **Species:** Rattus norvegicus (brown rat, species) [taxon 10116], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12863569/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12863569/full.md

## References

82 references — full list in the complete paper: https://tomesphere.com/paper/PMC12863569/full.md

---
Source: https://tomesphere.com/paper/PMC12863569