Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang, Xixuan Hao, Yuzhou Huang, Chong Ma, Jiaxun Zhang, Yi Pan,, and Ruimao Zhang

TL;DR
This paper presents a hybrid pre-training framework for radiograph representation learning that leverages multilevel semantic granularity and hierarchical text-image alignment to improve medical image understanding.
Contribution
The proposed HybridMED framework introduces hierarchical alignment and dual proxy tasks, enhancing radiograph representation learning with minimal additional parameters.
Findings
Improved performance on MIMIC-CXR dataset.
Effective knowledge distillation between branches.
Hierarchical alignment enhances semantic understanding.
Abstract
This paper introduces an innovative approach to Medical Vision-Language Pre-training (Med-VLP) area in the specialized context of radiograph representation learning. While conventional methods frequently merge textual annotations into unified reports, we acknowledge the intrinsic hierarchical relationship between the findings and impression section in radiograph datasets. To establish a targeted correspondence between images and texts, we propose a novel HybridMED framework to align global-level visual representations with impression and token-level visual representations with findings. Moreover, our framework incorporates a generation decoder that employs two proxy tasks, responsible for generating the impression from (1) images, via a captioning branch, and (2) findings, through a summarization branch. Additionally, knowledge distillation is leveraged to facilitate the training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiology practices and education · Radiomics and Machine Learning in Medical Imaging · Topic Modeling
MethodsKnowledge Distillation · ALIGN
