Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment

Hongyi Wang; Zhengjie Zhu; Jiabo Ma; Fang Wang; Yue Shi; Bo Luo; Jili Wang; Qiuyu Cai; Xiuming Zhang; Yen-Wei Chen; Lanfen Lin; and Hao Chen

arXiv:2510.23224·cs.CV·October 28, 2025

Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment

Hongyi Wang, Zhengjie Zhu, Jiabo Ma, Fang Wang, Yue Shi, Bo Luo, Jili Wang, Qiuyu Cai, Xiuming Zhang, Yen-Wei Chen, Lanfen Lin, and Hao Chen

PDF

TL;DR

PathSearch is a scalable multimodal retrieval framework that combines fine-grained visual and semantic slide representations using vision-language contrastive learning, significantly improving accuracy and consistency in digital pathology retrieval tasks.

Contribution

This work introduces PathSearch, a novel multimodal retrieval system that unifies detailed visual and semantic slide features for accurate, scalable, and versatile pathology image retrieval.

Findings

01

Outperforms traditional retrieval methods on multiple datasets

02

Enhances diagnostic accuracy and inter-observer agreement

03

Supports both image-to-image and text-to-slide retrieval

Abstract

The rapid digitization of histopathology slides has opened up new possibilities for computational tools in clinical and research workflows. Among these, content-based slide retrieval stands out, enabling pathologists to identify morphologically and semantically similar cases, thereby supporting precise diagnoses, enhancing consistency across observers, and assisting example-based education. However, effective retrieval of whole slide images (WSIs) remains challenging due to their gigapixel scale and the difficulty of capturing subtle semantic differences amid abundant irrelevant content. To overcome these challenges, we present PathSearch, a retrieval framework that unifies fine-grained attentive mosaic representations with global-wise slide embeddings aligned through vision-language contrastive learning. Trained on a corpus of 6,926 slide-report pairs, PathSearch captures both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.