PathFLIP: Fine-grained Language-Image Pretraining for Versatile Computational Pathology

Fengchun Liu; Songhan Jiang; Linghan Cai; Ziyue Wang; Yongbing Zhang

arXiv:2512.17621·cs.CV·December 22, 2025

PathFLIP: Fine-grained Language-Image Pretraining for Versatile Computational Pathology

Fengchun Liu, Songhan Jiang, Linghan Cai, Ziyue Wang, Yongbing Zhang

PDF

Open Access 1 Video

TL;DR

PathFLIP introduces a fine-grained, region-level language-image pretraining framework for whole slide images in pathology, enabling precise interpretation, localization, and versatile clinical task performance with less data.

Contribution

It presents a novel approach that decomposes slide captions into subcaptions and generates region-specific embeddings, improving fine-grained understanding and adaptability in computational pathology.

Findings

01

Outperforms existing models on four benchmarks.

02

Requires less training data than prior methods.

03

Excels in diverse tasks like classification, retrieval, and localization.

Abstract

While Vision-Language Models (VLMs) have achieved notable progress in computational pathology (CPath), the gigapixel scale and spatial heterogeneity of Whole Slide Images (WSIs) continue to pose challenges for multimodal understanding. Existing alignment methods struggle to capture fine-grained correspondences between textual descriptions and visual cues across thousands of patches from a slide, compromising their performance on downstream tasks. In this paper, we propose PathFLIP (Pathology Fine-grained Language-Image Pretraining), a novel framework for holistic WSI interpretation. PathFLIP decomposes slide-level captions into region-level subcaptions and generates text-conditioned region embeddings to facilitate precise visual-language grounding. By harnessing Large Language Models (LLMs), PathFLIP can seamlessly follow diverse clinical instructions and adapt to varied diagnostic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PathFLIP: Fine-grained Language-Image Pretraining for Versatile Computational Pathology· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · AI in cancer detection · Domain Adaptation and Few-Shot Learning