Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment
Hongyi Wang, Zhengjie Zhu, Jiabo Ma, Fang Wang, Yue Shi, Bo Luo, Jili Wang, Qiuyu Cai, Xiuming Zhang, Yen-Wei Chen, Lanfen Lin, and Hao Chen

TL;DR
PathSearch is a scalable multimodal retrieval framework that combines fine-grained visual and semantic slide representations using vision-language contrastive learning, significantly improving accuracy and consistency in digital pathology retrieval tasks.
Contribution
This work introduces PathSearch, a novel multimodal retrieval system that unifies detailed visual and semantic slide features for accurate, scalable, and versatile pathology image retrieval.
Findings
Outperforms traditional retrieval methods on multiple datasets
Enhances diagnostic accuracy and inter-observer agreement
Supports both image-to-image and text-to-slide retrieval
Abstract
The rapid digitization of histopathology slides has opened up new possibilities for computational tools in clinical and research workflows. Among these, content-based slide retrieval stands out, enabling pathologists to identify morphologically and semantically similar cases, thereby supporting precise diagnoses, enhancing consistency across observers, and assisting example-based education. However, effective retrieval of whole slide images (WSIs) remains challenging due to their gigapixel scale and the difficulty of capturing subtle semantic differences amid abundant irrelevant content. To overcome these challenges, we present PathSearch, a retrieval framework that unifies fine-grained attentive mosaic representations with global-wise slide embeddings aligned through vision-language contrastive learning. Trained on a corpus of 6,926 slide-report pairs, PathSearch captures both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
