S2D-ALIGN: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation

Jiechao Gao; Chang Liu; Yuangang Li

arXiv:2511.11066·cs.CV·November 17, 2025

S2D-ALIGN: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation

Jiechao Gao, Chang Liu, Yuangang Li

PDF

Open Access 1 Video

TL;DR

This paper introduces S2D-Align, a novel training paradigm for radiology report generation that progressively grounds reports in anatomical details using auxiliary signals, leading to improved alignment and report quality.

Contribution

S2D-Align employs a shallow-to-deep auxiliary learning strategy with a memory-based adapter to enhance anatomically-grounded alignment in radiology report generation.

Findings

01

Achieves state-of-the-art results on MIMIC-CXR and IU X-Ray datasets.

02

Demonstrates the effectiveness of multi-stage, auxiliary-guided alignment.

03

Validates the approach through comprehensive ablation studies.

Abstract

Radiology Report Generation (RRG) aims to automatically generate diagnostic reports from radiology images. To achieve this, existing methods have leveraged the powerful cross-modal generation capabilities of Multimodal Large Language Models (MLLMs), primarily focusing on optimizing cross-modal alignment between radiographs and reports through Supervised Fine-Tuning (SFT). However, by only performing instance-level alignment with the image-text pairs, the standard SFT paradigm fails to establish anatomically-grounded alignment, where the templated nature of reports often leads to sub-optimal generation quality. To address this, we propose \textsc{S2D-Align}, a novel SFT paradigm that establishes anatomically-grounded alignment by leveraging auxiliary signals of varying granularities. \textsc{S2D-Align} implements a shallow-to-deep strategy, progressively enriching the alignment process:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

S2D-Align: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning