CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation

Shantam Srivastava; Mahesh Bhosale; David Doermann; Mingchen Gao

arXiv:2604.10410·cs.AI·April 17, 2026

CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation

Shantam Srivastava, Mahesh Bhosale, David Doermann, Mingchen Gao

PDF

TL;DR

This paper introduces CWCD, a modular contrastive decoding framework that improves structured radiology report generation by focusing on category-specific visual prompts, outperforming baseline methods.

Contribution

The paper proposes a novel category-wise contrastive decoding approach with category-specific prompts to enhance radiology report generation accuracy.

Findings

01

CWCD outperforms baseline methods in clinical and language metrics.

02

Ablation studies show the effectiveness of each architectural component.

03

CWCD reduces spurious pathology co-occurrences in reports.

Abstract

Interpreting chest X-rays is inherently challenging due to the overlap between anatomical structures and the subtle presentation of many clinically significant pathologies, making accurate diagnosis time-consuming even for experienced radiologists. Recent radiology-focused foundation models, such as LLaVA-Rad and Maira-2, have positioned multi-modal large language models (MLLMs) at the forefront of automated radiology report generation (RRG). However, despite these advances, current foundation models generate reports in a single forward pass. This decoding strategy diminishes attention to visual tokens and increases reliance on language priors as generation proceeds, which in turn introduces spurious pathology co-occurrences in the generated reports. To mitigate these limitations, we propose Category-Wise Contrastive Decoding (CWCD), a novel and modular framework designed to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.