MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

Rongsheng Wang; Junying Chen; Ke Ji; Zhenyang Cai; Shunian Chen; Yunjin Yang; and Benyou Wang

arXiv:2507.05675·cs.CV·July 9, 2025

MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

Rongsheng Wang, Junying Chen, Ke Ji, Zhenyang Cai, Shunian Chen, Yunjin Yang, and Benyou Wang

PDF

Open Access 1 Datasets

TL;DR

MedGen introduces a large-scale, high-quality dataset and a novel model for medical video generation, significantly advancing the realism and medical accuracy of generated videos in this domain.

Contribution

The paper presents MedGen, a new medical video generation model trained on MedVideoCap-55K, the first large-scale, caption-rich dataset for medical videos, improving quality and accuracy.

Findings

01

MedGen outperforms existing open-source models in visual quality.

02

MedGen rivals commercial systems in medical accuracy.

03

The dataset enables better training of medical video generation models.

Abstract

Recent advances in video generation have shown remarkable progress in open-domain settings, yet medical video generation remains largely underexplored. Medical videos are critical for applications such as clinical training, education, and simulation, requiring not only high visual fidelity but also strict medical accuracy. However, current models often produce unrealistic or erroneous content when applied to medical prompts, largely due to the lack of large-scale, high-quality datasets tailored to the medical domain. To address this gap, we introduce MedVideoCap-55K, the first large-scale, diverse, and caption-rich dataset for medical video generation. It comprises over 55,000 curated clips spanning real-world medical scenarios, providing a strong foundation for training generalist medical video generation models. Built upon this dataset, we develop MedGen, which achieves leading…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

FreedomIntelligence/MedVideoCap-55K
dataset· 238 dl
238 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation