Equi-ViT: Rotational Equivariant Vision Transformer for Robust Histopathology Analysis
Fuyao Chen, Yuexi Du, El\`eonore V. Lieffrig, Nicha C. Dvornek, John A. Onofrey

TL;DR
Equi-ViT introduces a rotationally equivariant vision transformer architecture for histopathology, improving robustness and data efficiency by embedding built-in rotation equivariance into the model.
Contribution
It integrates an equivariant convolution kernel into ViT, enabling built-in rotational equivariance for better histopathology image analysis.
Findings
Enhanced rotation-consistent patch embeddings
Improved classification stability across orientations
Increased data efficiency and robustness
Abstract
Vision Transformers (ViTs) have gained rapid adoption in computational pathology for their ability to model long-range dependencies through self-attention, addressing the limitations of convolutional neural networks that excel at local pattern capture but struggle with global contextual reasoning. Recent pathology-specific foundation models have further advanced performance by leveraging large-scale pretraining. However, standard ViTs remain inherently non-equivariant to transformations such as rotations and reflections, which are ubiquitous variations in histopathology imaging. To address this limitation, we propose Equi-ViT, which integrates an equivariant convolution kernel into the patch embedding stage of a ViT architecture, imparting built-in rotational equivariance to learned representations. Equi-ViT achieves superior rotation-consistent patch embeddings and stable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Digital Imaging for Blood Diseases · Generative Adversarial Networks and Image Synthesis
