Cross-Modal Causal Intervention for Medical Report Generation
Weixing Chen, Yang Liu, Ce Wang, Jiarui Zhu, Guanbin Li, Cheng-Lin Liu, Liang Lin

TL;DR
This paper introduces a novel two-stage cross-modal causal learning framework for radiology report generation, effectively reducing biases and noise to improve report accuracy from medical images.
Contribution
The proposed CMCRL framework combines a specialized pre-training strategy and causal intervention modules to enhance cross-modal alignment and mitigate biases in radiology report generation.
Findings
Outperforms state-of-the-art methods on IU-Xray and MIMIC-CXR datasets.
Demonstrates effectiveness of causal intervention modules in reducing bias.
Shows robustness of the framework with ablation studies.
Abstract
Radiology Report Generation (RRG) is essential for computer-aided diagnosis and medication guidance, which can relieve the heavy burden of radiologists by automatically generating the corresponding radiology reports according to the given radiology image. However, generating accurate lesion descriptions remains challenging due to spurious correlations from visual-linguistic biases and inherent limitations of radiological imaging, such as low resolution and noise interference. To address these issues, we propose a two-stage framework named CrossModal Causal Representation Learning (CMCRL), consisting of the Radiological Cross-modal Alignment and Reconstruction Enhanced (RadCARE) pre-training and the Visual-Linguistic Causal Intervention (VLCI) fine-tuning. In the pre-training stage, RadCARE introduces a degradation-aware masked image restoration strategy tailored for radiological images,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
