AMRG: Extend Vision Language Models for Automatic Mammography Report Generation
Nak-Jun Sung, Donghyun Lee, Bo Hwa Choi, Chae Jung Park

TL;DR
This paper introduces AMRG, an end-to-end vision-language model framework for automatic mammography report generation, addressing key challenges with high-resolution images and unstructured language, and establishing a new benchmark in clinical AI.
Contribution
AMRG is the first reproducible, scalable framework for mammography report generation using large VLMs with efficient fine-tuning, and it systematically evaluates multiple models and hyperparameters.
Findings
Achieved ROUGE-L of 0.5691 and BI-RADS accuracy of 0.5582.
Established the first benchmark for mammography report generation.
Demonstrated improved diagnostic consistency and reduced hallucinations.
Abstract
Mammography report generation is a critical yet underexplored task in medical AI, characterized by challenges such as multiview image reasoning, high-resolution visual cues, and unstructured radiologic language. In this work, we introduce AMRG (Automatic Mammography Report Generation), the first end-to-end framework for generating narrative mammography reports using large vision-language models (VLMs). Building upon MedGemma-4B-it-a domain-specialized, instruction-tuned VLM-we employ a parameter-efficient fine-tuning (PEFT) strategy via Low-Rank Adaptation (LoRA), enabling lightweight adaptation with minimal computational overhead. We train and evaluate AMRG on DMID, a publicly available dataset of paired high-resolution mammograms and diagnostic reports. This work establishes the first reproducible benchmark for mammography report generation, addressing a longstanding gap in multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
