Vim4Path: Self-Supervised Vision Mamba for Histopathology Images

Ali Nasiri-Sarvi; Vincent Quoc-Huy Trinh; Hassan Rivaz; Mahdi; S. Hosseini

arXiv:2404.13222·eess.IV·May 28, 2024·2 cites

Vim4Path: Self-Supervised Vision Mamba for Histopathology Images

Ali Nasiri-Sarvi, Vincent Quoc-Huy Trinh, Hassan Rivaz, Mahdi, S. Hosseini

PDF

Open Access 1 Repo

TL;DR

This paper introduces Vim, a vision architecture inspired by state space models, for self-supervised learning on histopathology images, demonstrating superior performance over ViT especially at smaller scales, and aligning better with pathologist workflows.

Contribution

The paper proposes leveraging the Vision Mamba architecture within the DINO framework for improved self-supervised representation learning in computational pathology, outperforming Vision Transformers.

Findings

01

Vim achieves an 8.21 higher ROC AUC than ViT at similar model sizes.

02

Vim performs better at smaller scales compared to ViT.

03

Explainability analysis shows Vim mimics pathologist workflows.

Abstract

Representation learning from Gigapixel Whole Slide Images (WSI) poses a significant challenge in computational pathology due to the complicated nature of tissue structures and the scarcity of labeled data. Multi-instance learning methods have addressed this challenge, leveraging image patches to classify slides utilizing pretrained models using Self-Supervised Learning (SSL) approaches. The performance of both SSL and MIL methods relies on the architecture of the feature encoder. This paper proposes leveraging the Vision Mamba (Vim) architecture, inspired by state space models, within the DINO framework for representation learning in computational pathology. We evaluate the performance of Vim against Vision Transformers (ViT) on the Camelyon16 dataset for both patch-level and slide-level classification. Our findings highlight Vim's enhanced performance compared to ViT, particularly at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

atlasanalyticslab/vim4path
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Digital Imaging for Blood Diseases

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Residual Connection · Softmax · Vision Transformer · self-DIstillation with NO labels