Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized   Narratives from Open-Source Histopathology Videos

Mehmet Saygin Seyfioglu; Wisdom O. Ikezogwo; Fatemeh Ghezloo; Ranjay; Krishna; Linda Shapiro

arXiv:2312.04746·cs.CV·January 14, 2025·1 cites

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos

Mehmet Saygin Seyfioglu, Wisdom O. Ikezogwo, Fatemeh Ghezloo, Ranjay, Krishna, Linda Shapiro

PDF

Open Access 2 Repos 1 Models 5 Datasets

TL;DR

Quilt-LLaVA is a multi-modal model trained on a large histopathology instruction dataset that enables diagnostic reasoning across whole slide images by extracting localized narratives from educational videos.

Contribution

The paper introduces Quilt-Instruct, a large-scale dataset with spatially grounded question-answer pairs from histopathology videos, and trains Quilt-LLaVA to perform comprehensive diagnostic reasoning across WSIs.

Findings

01

Outperforms SOTA by over 10% on GPT-4 score

02

Achieves 4% and 9% improvements on open and closed set VQA

03

Demonstrates effective reasoning across multiple image patches.

Abstract

Diagnosis in histopathology requires a global whole slide images (WSIs) analysis, requiring pathologists to compound evidence from different WSI patches. The gigapixel scale of WSIs poses a challenge for histopathology multi-modal models. Training multi-model models for histopathology requires instruction tuning datasets, which currently contain information for individual image patches, without a spatial grounding of the concepts within each patch and without a wider view of the WSI. Therefore, they lack sufficient diagnostic capacity for histopathology. To bridge this gap, we introduce Quilt-Instruct, a large-scale dataset of 107,131 histopathology-specific instruction question/answer pairs, grounded within diagnostically relevant image patches that make up the WSI. Our dataset is collected by leveraging educational histopathology videos from YouTube, which provides spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
wisdomik/Quilt-Llava-v1.5-7b
model· 1.3k dl· ♡ 9
1.3k dl♡ 9

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Byte Pair Encoding · Residual Connection · Layer Normalization · Dropout · Dense Connections