Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance

Chenyu Wang; Weicheng Dai; Han Liu; Wenchao Li; Kayhan Batmanghelich

arXiv:2604.10437·cs.CV·April 14, 2026

Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance

Chenyu Wang, Weicheng Dai, Han Liu, Wenchao Li, Kayhan Batmanghelich

PDF

TL;DR

This paper introduces DCP-PD, a framework that enhances fine-grained spatial grounding in 3D CT report generation by guiding models with discriminative cues, significantly improving performance and robustness.

Contribution

The authors propose a novel discriminative cue-prompting framework with prompt dropout to improve fine-grained spatial grounding in radiology report generation.

Findings

01

State-of-the-art macro F1 score of 0.603 on CT-RATE

02

89% relative improvement in out-of-distribution performance on Rad-ChestCT

03

Fine-grained spatial localization remains challenging despite high benchmark scores

Abstract

Vision--language models (VLMs) for radiology report generation (RRG) can produce long-form chest CT reports from volumetric scans and show strong potential to improve radiology workflow efficiency and consistency. However, existing methods face two key limitations: (i) training supervision is often coarse, aligning a whole CT volume with a full free-text report without explicit alignment for fine-grained attributes or pathology locations; and (ii) evaluation is typically holistic (lexical overlap, entity matching, or LLM-as-a-judge scores) and not diagnostic for spatial grounding. We propose \emph{Discriminative Cue-Prompting with Prompt Dropout (DCP-PD)}, a plug-and-play framework that distills fine-grained cues from free-text reports and uses them to guide report generation while mitigating shortcut reliance via prompt dropout. DCP-PD achieves state-of-the-art performance on CT-RATE,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.