Towards Scalable Language-Image Pre-training for 3D Medical Imaging
Chenhui Zhao, Yiwei Lyu, Asadur Chowdury, Edward Harake, Akhil Kondepudi, Akshay Rao, Xinhai Hou, Honglak Lee, Todd Hollon

TL;DR
This paper introduces HLIP, a hierarchical attention-based framework for scalable language-image pre-training directly on uncurated 3D medical imaging studies, achieving state-of-the-art results and demonstrating strong generalizability.
Contribution
The paper presents a novel hierarchical attention mechanism tailored for uncurated 3D medical data, enabling scalable pre-training without manual curation.
Findings
Achieves +10.5% balanced accuracy on Pub-Brain-5 benchmark.
Improves macro AUC by +8.3% on CQ500 and +1.7% on RSNA head CT benchmarks.
Enhances generalizability with +4.3% macro AUC on Rad-ChestCT when pre-trained on CT-RATE.
Abstract
The scalability of current language-image pre-training for 3D medical imaging, such as CT and MRI, is constrained by the need for radiologists to manually curate raw clinical studies. In this work, we pioneer pre-training directly on uncurated studies, which both aligns more closely with the radiologist's workflow and provides a natural path to scalability. However, the unique structure of such data presents new challenges for existing model architectures, which were originally designed for 2D slices or single 3D scans. To address this, we introduce a novel hierarchical attention mechanism inspired by the intrinsic hierarchy of radiology data: slice, scan, and study. We denote our framework as Hierarchical attention for Language-Image Pre-training (HLIP). Trained on 220K studies with 3.13 million scans for brain MRI and 240K studies with 1.44 million scans for head CT, HLIP achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · Advanced Neural Network Applications
