WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question   Answering

Pingyi Chen; Chenglu Zhu; Sunyi Zheng; Honglin Li; Lin Yang

arXiv:2407.05603·cs.CV·July 9, 2024

WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering

Pingyi Chen, Chenglu Zhu, Sunyi Zheng, Honglin Li, Lin Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces WSI-VQA, a generative visual question answering framework for interpreting whole slide images, enabling diverse diagnostic tasks and outperforming existing models with a new dataset and explainability features.

Contribution

The paper presents a novel generative VQA framework for WSIs, a new dataset with question-answer pairs, and demonstrates improved performance and interpretability over existing methods.

Findings

01

W2T model outperforms discriminative models in medical correctness.

02

A new dataset with 8672 question-answer pairs for WSIs is established.

03

Visual co-attention provides intuitive explanations for diagnostic results.

Abstract

Whole slide imaging is routinely adopted for carcinoma diagnosis and prognosis. Abundant experience is required for pathologists to achieve accurate and reliable diagnostic results of whole slide images (WSI). The huge size and heterogeneous features of WSIs make the workflow of pathological reading extremely time-consuming. In this paper, we propose a novel framework (WSI-VQA) to interpret WSIs by generative visual question answering. WSI-VQA shows universality by reframing various kinds of slide-level tasks in a question-answering pattern, in which pathologists can achieve immunohistochemical grading, survival prediction, and tumor subtyping following human-machine interaction. Furthermore, we establish a WSI-VQA dataset which contains 8672 slide-level question-answering pairs with 977 WSIs. Besides the ability to deal with different slide-level tasks, our generative model which is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cpystan/wsi-vqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Byte Pair Encoding · Layer Normalization · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam