Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance
Chenyu Wang, Weicheng Dai, Han Liu, Wenchao Li, Kayhan Batmanghelich

TL;DR
This paper introduces DCP-PD, a framework that enhances fine-grained spatial grounding in 3D CT report generation by guiding models with discriminative cues, significantly improving performance and robustness.
Contribution
The authors propose a novel discriminative cue-prompting framework with prompt dropout to improve fine-grained spatial grounding in radiology report generation.
Findings
State-of-the-art macro F1 score of 0.603 on CT-RATE
89% relative improvement in out-of-distribution performance on Rad-ChestCT
Fine-grained spatial localization remains challenging despite high benchmark scores
Abstract
Vision--language models (VLMs) for radiology report generation (RRG) can produce long-form chest CT reports from volumetric scans and show strong potential to improve radiology workflow efficiency and consistency. However, existing methods face two key limitations: (i) training supervision is often coarse, aligning a whole CT volume with a full free-text report without explicit alignment for fine-grained attributes or pathology locations; and (ii) evaluation is typically holistic (lexical overlap, entity matching, or LLM-as-a-judge scores) and not diagnostic for spatial grounding. We propose \emph{Discriminative Cue-Prompting with Prompt Dropout (DCP-PD)}, a plug-and-play framework that distills fine-grained cues from free-text reports and uses them to guide report generation while mitigating shortcut reliance via prompt dropout. DCP-PD achieves state-of-the-art performance on CT-RATE,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
