CA-GCL: Cross-Anatomy Global-Local Contrastive Learning for Robust 3D Medical Image Understanding
Hanwen Zhang, Yao Liu, Die Dai, Jiaye Yang, Qiao Liu, Yutong Xie, Peng Wang

TL;DR
CA-GCL introduces a global-local contrastive learning framework that enhances 3D medical image understanding by improving anatomical representation separation and robustness against descriptive variations.
Contribution
It proposes a novel contrastive learning method with global and local objectives, plus clinical-aware text augmentation, to address representation collapse and improve robustness in medical imaging.
Findings
Outperforms existing VLP methods in zero-shot abnormality detection.
Achieves better cross-dataset generalization.
Reduces performance variance across prompt templates.
Abstract
Fine-grained Vision-Language Pre-training (FVLP) demonstrates significant potential in 3D medical image understanding by aligning anatomy-level visual representations with corresponding textual descriptions. However, existing FVLP paradigms often suffer from severe representation collapse in the textual embedding space, where text embeddings of distinct anatomical structures become highly clustered and indistinguishable. This distributional degeneracy renders the model hypersensitive to prompt variations, hindering reliable clinical deployment. To address these challenges, we propose a novel Cross-Anatomy Global-Local Contrastive Learning framework (CA-GCL). CA-GCL introduces a global contrastive objective that enforces separation between anatomical categories in the latent space, effectively counteracting the aggregation tendency induced by local alignment. Furthermore, we incorporate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
