Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training
Jinxia Yang, Bing Su, Wayne Xin Zhao, Ji-Rong Wen

TL;DR
Med-ST introduces a novel framework that leverages spatial multi-view and temporal information in medical imaging to improve fine-grained alignment and temporal understanding, enhancing performance across multiple tasks.
Contribution
The paper presents Med-ST, a framework that exploits multi-view spatial and temporal data in medical images using innovative alignment and cycle consistency methods.
Findings
Improved performance on temporal classification tasks
Effective integration of multi-view spatial features
Enhanced alignment between images and reports
Abstract
Medical vision-language pre-training methods mainly leverage the correspondence between paired medical images and radiological reports. Although multi-view spatial images and temporal sequences of image-report pairs are available in off-the-shelf multi-modal medical datasets, most existing methods have not thoroughly tapped into such extensive supervision signals. In this paper, we introduce the Med-ST framework for fine-grained spatial and temporal modeling to exploit information from multiple spatial views of chest radiographs and temporal historical records. For spatial modeling, Med-ST employs the Mixture of View Expert (MoVE) architecture to integrate different visual features from both frontal and lateral views. To achieve a more comprehensive alignment, Med-ST not only establishes the global alignment between whole images and texts but also introduces modality-weighted local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Semantic Web and Ontologies · Intelligent Tutoring Systems and Adaptive Learning
