CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation
Shantam Srivastava, Mahesh Bhosale, David Doermann, Mingchen Gao

TL;DR
This paper introduces CWCD, a modular contrastive decoding framework that improves structured radiology report generation by focusing on category-specific visual prompts, outperforming baseline methods.
Contribution
The paper proposes a novel category-wise contrastive decoding approach with category-specific prompts to enhance radiology report generation accuracy.
Findings
CWCD outperforms baseline methods in clinical and language metrics.
Ablation studies show the effectiveness of each architectural component.
CWCD reduces spurious pathology co-occurrences in reports.
Abstract
Interpreting chest X-rays is inherently challenging due to the overlap between anatomical structures and the subtle presentation of many clinically significant pathologies, making accurate diagnosis time-consuming even for experienced radiologists. Recent radiology-focused foundation models, such as LLaVA-Rad and Maira-2, have positioned multi-modal large language models (MLLMs) at the forefront of automated radiology report generation (RRG). However, despite these advances, current foundation models generate reports in a single forward pass. This decoding strategy diminishes attention to visual tokens and increases reliance on language priors as generation proceeds, which in turn introduces spurious pathology co-occurrences in the generated reports. To mitigate these limitations, we propose Category-Wise Contrastive Decoding (CWCD), a novel and modular framework designed to enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
