Medical Report Generation: A Hierarchical Task Structure-Based Cross-Modal Causal Intervention Framework
Yucheng Song, Yifan Ge, Junhao Li, Zhining Liao, Zhifang Liao

TL;DR
This paper introduces HTSC-CIF, a hierarchical framework for medical report generation that addresses knowledge understanding, cross-modal alignment, and bias reduction, significantly improving performance over existing methods.
Contribution
The paper proposes a novel hierarchical task decomposition framework that jointly tackles three key challenges in medical report generation using cross-modal causal intervention.
Findings
HTSC-CIF outperforms state-of-the-art methods in medical report generation.
The framework effectively reduces cross-modal biases and spurious correlations.
Experimental results demonstrate improved interpretability and accuracy.
Abstract
Medical Report Generation (MRG) is a key part of modern medical diagnostics, as it automatically generates reports from radiological images to reduce radiologists' burden. However, reliable MRG models for lesion description face three main challenges: insufficient domain knowledge understanding, poor text-visual entity embedding alignment, and spurious correlations from cross-modal biases. Previous work only addresses single challenges, while this paper tackles all three via a novel hierarchical task decomposition approach, proposing the HTSC-CIF framework. HTSC-CIF classifies the three challenges into low-, mid-, and high-level tasks: 1) Low-level: align medical entity features with spatial locations to enhance domain knowledge for visual encoders; 2) Mid-level: use Prefix Language Modeling (text) and Masked Image Modeling (images) to boost cross-modal alignment via mutual guidance; 3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
