Democratising Pathology Co-Pilots: An Open Pipeline and Dataset for Whole-Slide Vision-Language Modelling
Sander Moonemans, Sebastiaan Ram, Fr\'ed\'erique Meeuwsen, Carlijn Lems, Jeroen van der Laak, Geert Litjens, Francesco Ciompi

TL;DR
This paper introduces a new dataset and model for whole-slide image vision-language tasks in pathology, enabling more transparent and generalisable co-pilots for pathologists.
Contribution
It presents Polysome for synthetic instruction generation, creates the HISTAI-Instruct dataset, and trains ANTONI-α, a VLM that surpasses existing models on WSI-level VQA tasks.
Findings
ANTONI-α outperforms MedGemma on tissue identification and neoplasm detection.
HISTAI-Instruct contains over 1.1 million instruction-response pairs.
Multiple versions of ANTONI-α trained with different data amounts are evaluated.
Abstract
Vision-language models (VLMs) have the potential to become co-pilots for pathologists. However, most VLMs either focus on small regions of interest within whole-slide images, provide only static slide-level outputs, or rely on data that is not publicly available, limiting reproducibility. Furthermore, training data containing WSIs paired with detailed clinical reports is scarce, restricting progress toward transparent and generalisable VLMs. We address these limitations with three main contributions. First, we introduce Polysome, a standardised tool for synthetic instruction generation. Second, we apply Polysome to the public HISTAI dataset, generating HISTAI-Instruct, a large whole-slide instruction tuning dataset spanning 24,259 slides and over 1.1 million instruction-response pairs. Finally, we use HISTAI-Instruct to train ANTONI-{\alpha}, a VLM capable of visual-question answering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
