AMRG: Extend Vision Language Models for Automatic Mammography Report Generation

Nak-Jun Sung; Donghyun Lee; Bo Hwa Choi; Chae Jung Park

arXiv:2508.09225·eess.IV·August 14, 2025

AMRG: Extend Vision Language Models for Automatic Mammography Report Generation

Nak-Jun Sung, Donghyun Lee, Bo Hwa Choi, Chae Jung Park

PDF

TL;DR

This paper introduces AMRG, an end-to-end vision-language model framework for automatic mammography report generation, addressing key challenges with high-resolution images and unstructured language, and establishing a new benchmark in clinical AI.

Contribution

AMRG is the first reproducible, scalable framework for mammography report generation using large VLMs with efficient fine-tuning, and it systematically evaluates multiple models and hyperparameters.

Findings

01

Achieved ROUGE-L of 0.5691 and BI-RADS accuracy of 0.5582.

02

Established the first benchmark for mammography report generation.

03

Demonstrated improved diagnostic consistency and reduced hallucinations.

Abstract

Mammography report generation is a critical yet underexplored task in medical AI, characterized by challenges such as multiview image reasoning, high-resolution visual cues, and unstructured radiologic language. In this work, we introduce AMRG (Automatic Mammography Report Generation), the first end-to-end framework for generating narrative mammography reports using large vision-language models (VLMs). Building upon MedGemma-4B-it-a domain-specialized, instruction-tuned VLM-we employ a parameter-efficient fine-tuning (PEFT) strategy via Low-Rank Adaptation (LoRA), enabling lightweight adaptation with minimal computational overhead. We train and evaluate AMRG on DMID, a publicly available dataset of paired high-resolution mammograms and diagnostic reports. This work establishes the first reproducible benchmark for mammography report generation, addressing a longstanding gap in multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.