HistoViT: Vision Transformer for Accurate and Scalable Histopathological Cancer Diagnosis
Faisal Ahmed

TL;DR
HistoViT introduces a transformer-based deep learning model that significantly improves multi-class histopathological cancer diagnosis accuracy and scalability across various tissue types, outperforming traditional CNNs.
Contribution
This work presents a novel Vision Transformer framework tailored for histopathology, addressing limitations of CNNs and demonstrating superior performance on multiple cancer datasets.
Findings
Achieved over 99% accuracy on breast cancer dataset
Outperformed existing deep learning methods across all tested datasets
Demonstrated robustness and generalizability in digital pathology
Abstract
Accurate and scalable cancer diagnosis remains a critical challenge in modern pathology, particularly for malignancies such as breast, prostate, bone, and cervical, which exhibit complex histological variability. In this study, we propose a transformer-based deep learning framework for multi-class tumor classification in histopathological images. Leveraging a fine-tuned Vision Transformer (ViT) architecture, our method addresses key limitations of conventional convolutional neural networks, offering improved performance, reduced preprocessing requirements, and enhanced scalability across tissue types. To adapt the model for histopathological cancer images, we implement a streamlined preprocessing pipeline that converts tiled whole-slide images into PyTorch tensors and standardizes them through data normalization. This ensures compatibility with the ViT architecture and enhances both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
