TL;DR
TCSA-UDA introduces a text-driven framework for unsupervised domain adaptation in medical image segmentation, aligning visual features with semantic class descriptions to improve cross-modality performance.
Contribution
It proposes a novel vision-language covariance cosine loss and prototype alignment module to enhance semantic and modality-invariant feature learning.
Findings
Significantly reduces domain shift in medical image segmentation.
Outperforms state-of-the-art UDA methods on multiple benchmarks.
Effectively leverages textual descriptions for cross-modal alignment.
Abstract
Unsupervised domain adaptation for medical image segmentation remains a significant challenge due to substantial domain shifts across imaging modalities, such as CT and MRI. While recent vision-language representation learning methods have shown promise, their potential in UDA segmentation tasks remains underexplored. To address this gap, we propose TCSA-UDA, a Text-driven Cross-Semantic Alignment framework that leverages domain-invariant textual class descriptions to guide visual representation learning. Our approach introduces a vision-language covariance cosine loss to directly align image encoder features with inter-class textual semantic relations, encouraging semantically meaningful and modality-invariant feature representations. Additionally, we incorporate a prototype alignment module that aligns class-wise pixel-level feature distributions across domains using high-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
