Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts

Peixuan Ge; Tongkun Su; Faqin Lv; Baoliang Zhao; Peng Zhang; Chi Hong Wong; Liang Yao; Yu Sun; Zenan Wang; Pak Kin Wong; Ying Hu

arXiv:2505.08838·eess.IV·May 20, 2025

Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts

Peixuan Ge, Tongkun Su, Faqin Lv, Baoliang Zhao, Peng Zhang, Chi Hong Wong, Liang Yao, Yu Sun, Zenan Wang, Pak Kin Wong, Ying Hu

PDF

Open Access

TL;DR

This paper introduces a unified multimodal large language model framework for automated ultrasound report generation across multiple organs and languages, improving accuracy and consistency over previous methods.

Contribution

The study presents a novel unified framework that integrates fragment-based multilingual training and modular text alignment for ultrasound report generation, enhancing scalability and clinical accuracy.

Findings

01

Achieved about 2% improvement in BLEU scores over previous state-of-the-art.

02

Reduced errors such as missing or incorrect content significantly.

03

Demonstrated effective multi-organ and multilingual report generation in a unified system.

Abstract

Ultrasound (US) report generation is a challenging task due to the variability of US images, operator dependence, and the need for standardized text. Unlike X-ray and CT, US imaging lacks consistent datasets, making automation difficult. In this study, we propose a unified framework for multi-organ and multilingual US report generation, integrating fragment-based multilingual training and leveraging the standardized nature of US reports. By aligning modular text fragments with diverse imaging data and curating a bilingual English-Chinese dataset, the method achieves consistent and clinically accurate text generation across organ sites and languages. Fine-tuning with selective unfreezing of the vision transformer (ViT) further improves text-image alignment. Compared to the previous state-of-the-art KMVE method, our approach achieves relative gains of about 2\% in BLEU scores,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsAttention Is All You Need · Layer Normalization · Softmax · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Vision Transformer