Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration
Jun Wang, Lixing Zhu, Xiaohan Yu, Abhir Bhalerao, and Yulan He

TL;DR
This paper introduces PLACE, a novel framework for medical visual representation learning that emphasizes pathological-level cross-modal alignment and correlation exploration, improving performance across multiple medical imaging tasks without extra annotations.
Contribution
The paper proposes a new pathological-level cross-modal alignment method and a correlation exploration proxy task, enhancing medical visual representations without relying on external disease annotations.
Findings
Achieves state-of-the-art results on classification tasks
Improves image-to-text retrieval performance
Enhances semantic segmentation and object detection accuracy
Abstract
Learning medical visual representations from image-report pairs through joint learning has garnered increasing research attention due to its potential to alleviate the data scarcity problem in the medical domain. The primary challenges stem from the lengthy reports that feature complex discourse relations and semantic pathologies. Previous works have predominantly focused on instance-wise or token-wise cross-modal alignment, often neglecting the importance of pathological-level consistency. This paper presents a novel framework PLACE that promotes the Pathological-Level Alignment and enriches the fine-grained details via Correlation Exploration without additional human annotations. Specifically, we propose a novel pathological-level cross-modal alignment (PCMA) approach to maximize the consistency of pathology observations from both images and reports. To facilitate this, a Visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
