Clinically-aligned Multi-modal Chest X-ray Classification
Phillip Sloan, Edwin Simpson, Majid Mirmehdi

TL;DR
This paper presents CaMCheX, a multimodal transformer framework that integrates multi-view chest X-ray images with clinical data to improve diagnostic classification, surpassing existing methods on benchmark datasets.
Contribution
Introduces CaMCheX, a novel multimodal transformer architecture that aligns multi-view X-ray images with structured clinical data for enhanced classification accuracy.
Findings
Outperforms state-of-the-art on MIMIC-CXR and CXR-LT datasets.
Demonstrates the effectiveness of multimodal clinical alignment.
Highlights the importance of multi-view and clinical data integration.
Abstract
Radiology is essential to modern healthcare, yet rising demand and staffing shortages continue to pose major challenges. Recent advances in artificial intelligence have the potential to support radiologists and help address these challenges. Given its widespread use and clinical importance, chest X-ray classification is well suited to augment radiologists' workflows. However, most existing approaches rely solely on single-view, image-level inputs, ignoring the structured clinical information and multi-image studies available at the time of reporting. In this work, we introduce CaMCheX, a multimodal transformer-based framework that aligns multi-view chest X-ray studies with structured clinical data to better reflect how clinicians make diagnostic decisions. Our architecture employs view-specific ConvNeXt encoders for frontal and lateral chest radiographs, whose features are fused with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Artificial Intelligence in Healthcare and Education · Domain Adaptation and Few-Shot Learning
