Semi-MedRef: Semi-Supervised Medical Referring Image Segmentation with Cross-Modal Alignment

Yuchen Li; Zhen Zhao; Yi Liu; and Luping Zhou

arXiv:2605.15720·cs.CV·May 18, 2026

Semi-MedRef: Semi-Supervised Medical Referring Image Segmentation with Cross-Modal Alignment

Yuchen Li, Zhen Zhao, Yi Liu, and Luping Zhou

PDF

TL;DR

Semi-MedRef introduces a semi-supervised framework for medical image segmentation that maintains cross-modal alignment using novel augmentation and contrastive learning techniques, improving performance with limited labeled data.

Contribution

It proposes a new teacher-student SSL method with alignment-preserving components like T-PatchMix, PosAug, and ITCL for better image-text coherence in medical segmentation.

Findings

01

Outperforms fully supervised and semi-supervised baselines on QaTa-COV19 and MosMedData+ datasets.

02

Effectively maintains image-text alignment under strong augmentations.

03

Achieves consistent improvements across various label regimes.

Abstract

Medical referring image segmentation (MRIS) requires pixel-level masks aligned with textual descriptions of anatomical locations, making annotation costly in low-label regimes. Semi-supervised learning (SSL) can mitigate this burden by leveraging unlabeled data, but its success hinges on maintaining reliable image-text alignment under perturbations. Most existing SSL-based referred segmentation methods use either independent or simplistic multi-modal perturbations (e.g., left-right flips), without fully addressing cross-modal alignment under strong augmentation, while CutMix, highly effective in single-modal SSL, remains underexplored in multi-modal settings due to its tendency to disrupt image-text coherence. We propose Semi-MedRef, a teacher-student SSL framework designed to explicitly maintain consistency between medical images and positional language through three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.