Semi-MedRef: Semi-Supervised Medical Referring Image Segmentation with Cross-Modal Alignment
Yuchen Li, Zhen Zhao, Yi Liu, and Luping Zhou

TL;DR
Semi-MedRef introduces a semi-supervised framework for medical image segmentation that maintains cross-modal alignment using novel augmentation and contrastive learning techniques, improving performance with limited labeled data.
Contribution
It proposes a new teacher-student SSL method with alignment-preserving components like T-PatchMix, PosAug, and ITCL for better image-text coherence in medical segmentation.
Findings
Outperforms fully supervised and semi-supervised baselines on QaTa-COV19 and MosMedData+ datasets.
Effectively maintains image-text alignment under strong augmentations.
Achieves consistent improvements across various label regimes.
Abstract
Medical referring image segmentation (MRIS) requires pixel-level masks aligned with textual descriptions of anatomical locations, making annotation costly in low-label regimes. Semi-supervised learning (SSL) can mitigate this burden by leveraging unlabeled data, but its success hinges on maintaining reliable image-text alignment under perturbations. Most existing SSL-based referred segmentation methods use either independent or simplistic multi-modal perturbations (e.g., left-right flips), without fully addressing cross-modal alignment under strong augmentation, while CutMix, highly effective in single-modal SSL, remains underexplored in multi-modal settings due to its tendency to disrupt image-text coherence. We propose Semi-MedRef, a teacher-student SSL framework designed to explicitly maintain consistency between medical images and positional language through three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
