ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos

Trinh T.L. Vuong; Jin Tae Kwak

arXiv:2505.04192·cs.CV·October 14, 2025

ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos

Trinh T.L. Vuong, Jin Tae Kwak

PDF

1 Repo

TL;DR

ViDRiP-LLaVA introduces a large multimodal model and dataset for diagnostic reasoning in pathology videos, combining diverse image scenarios and chain-of-thought explanations to support clinical decision-making.

Contribution

It is the first to integrate multiple pathology video scenarios with a new dataset and benchmark, advancing AI diagnostic reasoning in computational pathology.

Findings

01

Established a new benchmark for pathology video analysis

02

Transferred knowledge from single-image datasets to improve video understanding

03

Demonstrated the effectiveness of multimodal reasoning in diagnostic tasks

Abstract

We present ViDRiP-LLaVA, the first large multimodal model (LMM) in computational pathology that integrates three distinct image scenarios, including single patch images, automatically segmented pathology video clips, and manually segmented pathology videos. This integration closely mirrors the natural diagnostic process of pathologists. By generating detailed histological descriptions and culminating in a definitive sign-out diagnosis, ViDRiP-LLaVA bridges visual narratives with diagnostic reasoning. Central to our approach is the ViDRiP-Instruct dataset, comprising 4278 video and diagnosis-specific chain-of-thought instructional pairs sourced from educational histopathology videos on YouTube. Although high-quality data is critical for enhancing diagnostic reasoning, its creation is time-intensive and limited in volume. To overcome this challenge, we transfer knowledge from existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trinhvg/videopath-llava
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.