Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts
Peixuan Ge, Tongkun Su, Faqin Lv, Baoliang Zhao, Peng Zhang, Chi Hong Wong, Liang Yao, Yu Sun, Zenan Wang, Pak Kin Wong, Ying Hu

TL;DR
This paper introduces a unified multimodal large language model framework for automated ultrasound report generation across multiple organs and languages, improving accuracy and consistency over previous methods.
Contribution
The study presents a novel unified framework that integrates fragment-based multilingual training and modular text alignment for ultrasound report generation, enhancing scalability and clinical accuracy.
Findings
Achieved about 2% improvement in BLEU scores over previous state-of-the-art.
Reduced errors such as missing or incorrect content significantly.
Demonstrated effective multi-organ and multilingual report generation in a unified system.
Abstract
Ultrasound (US) report generation is a challenging task due to the variability of US images, operator dependence, and the need for standardized text. Unlike X-ray and CT, US imaging lacks consistent datasets, making automation difficult. In this study, we propose a unified framework for multi-organ and multilingual US report generation, integrating fragment-based multilingual training and leveraging the standardized nature of US reports. By aligning modular text fragments with diverse imaging data and curating a bilingual English-Chinese dataset, the method achieves consistent and clinically accurate text generation across organ sites and languages. Fine-tuning with selective unfreezing of the vision transformer (ViT) further improves text-image alignment. Compared to the previous state-of-the-art KMVE method, our approach achieves relative gains of about 2\% in BLEU scores,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsAttention Is All You Need · Layer Normalization · Softmax · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Vision Transformer
