Instance-based Vision Transformer for Subtyping of Papillary Renal Cell   Carcinoma in Histopathological Image

Zeyu Gao; Bangyang Hong; Xianli Zhang; Yang Li; Chang Jia; Jialun Wu,; Chunbao Wang; Deyu Meng; Chen Li

arXiv:2106.12265·cs.CV·June 24, 2021·5 cites

Instance-based Vision Transformer for Subtyping of Papillary Renal Cell Carcinoma in Histopathological Image

Zeyu Gao, Bangyang Hong, Xianli Zhang, Yang Li, Chang Jia, Jialun Wu,, Chunbao Wang, Deyu Meng, Chen Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces an instance-based Vision Transformer (i-ViT) that effectively captures cellular and cell-layer level patterns in histopathological images to improve subtyping accuracy of papillary renal cell carcinoma, outperforming CNN models.

Contribution

The paper presents a novel i-ViT model that leverages instance patches and multi-head self-attention for fine-grained histopathological image classification, addressing limitations of CNNs in capturing subtle cellular features.

Findings

01

i-ViT outperforms existing CNN-based models significantly.

02

The model effectively captures cellular and cell-layer patterns.

03

Experimental results on 1162 image regions show improved accuracy.

Abstract

Histological subtype of papillary (p) renal cell carcinoma (RCC), type 1 vs. type 2, is an essential prognostic factor. The two subtypes of pRCC have a similar pattern, i.e., the papillary architecture, yet some subtle differences, including cellular and cell-layer level patterns. However, the cellular and cell-layer level patterns almost cannot be captured by existing CNN-based models in large-size histopathological images, which brings obstacles to directly applying these models to such a fine-grained classification task. This paper proposes a novel instance-based Vision Transformer (i-ViT) to learn robust representations of histopathological images for the pRCC subtyping task by extracting finer features from instance patches (by cropping around segmented nuclei and assigning predicted grades). The proposed i-ViT takes top-K instances as input and aggregates them for capturing both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZeyuGaoAi/Instance_based_Vision_Transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Radiomics and Machine Learning in Medical Imaging · Colorectal Cancer Screening and Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Layer Normalization · Dropout · Vision Transformer