MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video
Xiaoqing Guo, Qianhui Men, and J. Alison Noble

TL;DR
MMSummary is an automated multimodal system that generates comprehensive summaries of fetal ultrasound videos, reducing scanning time and aiding clinical workflow through keyframe detection, captioning, and biometric measurement.
Contribution
The paper introduces the first automated multimodal summary generation system for fetal ultrasound videos, integrating keyframe detection, captioning, and biometric analysis in a three-stage pipeline.
Findings
Reduces scanning time by approximately 31.5%.
Provides comprehensive summaries of fetal ultrasound examinations.
Automates keyframe selection, captioning, and biometric measurement.
Abstract
We present the first automated multimodal summary generation system, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis. Imitating the examination process performed by a human sonographer, MMSummary is designed as a three-stage pipeline, progressing from keyframe detection to keyframe captioning and finally anatomy segmentation and measurement. In the keyframe detection stage, an innovative automated workflow is proposed to progressively select a concise set of keyframes, preserving sufficient video information without redundancy. Subsequently, we adapt a large language model to generate meaningful captions for fetal ultrasound keyframes in the keyframe captioning stage. If a keyframe is captioned as fetal biometry, the segmentation and measurement stage estimates biometric parameters by segmenting the region of interest according to the textual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsSparse Evolutionary Training · Focus
