Multimodal Prototype Alignment for Semi-supervised Pathology Image Segmentation

Mingxi Fu; Fanglei Fu; Xitong Ling; Huaitian Yuan; Tian Guan; Yonghong He; Lianghui Zhu

arXiv:2508.19574·cs.CV·August 28, 2025

Multimodal Prototype Alignment for Semi-supervised Pathology Image Segmentation

Mingxi Fu, Fanglei Fu, Xitong Ling, Huaitian Yuan, Tian Guan, Yonghong He, Lianghui Zhu

PDF

TL;DR

This paper introduces MPAMatch, a semi-supervised pathology image segmentation framework that leverages multimodal prototype-guided contrastive learning to improve semantic boundary detection and structural understanding.

Contribution

It proposes a novel dual contrastive learning scheme using image and text prototypes, and integrates a pathology-pretrained backbone for enhanced feature extraction.

Findings

01

MPAMatch outperforms state-of-the-art methods on multiple datasets.

02

The dual prototype-guided supervision improves semantic boundary modeling.

03

Replacing ViT with a pathology-pretrained model enhances feature relevance.

Abstract

Pathological image segmentation faces numerous challenges, particularly due to ambiguous semantic boundaries and the high cost of pixel-level annotations. Although recent semi-supervised methods based on consistency regularization (e.g., UniMatch) have made notable progress, they mainly rely on perturbation-based consistency within the image modality, making it difficult to capture high-level semantic priors, especially in structurally complex pathology images. To address these limitations, we propose MPAMatch - a novel segmentation framework that performs pixel-level contrastive learning under a multimodal prototype-guided supervision paradigm. The core innovation of MPAMatch lies in the dual contrastive learning scheme between image prototypes and pixel labels, and between text prototypes and pixel labels, providing supervision at both structural and semantic levels. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.