SurGen: Text-Guided Diffusion Model for Surgical Video Generation
Joseph Cho, Samuel Schmidgall, Cyril Zakka, Mrudang Mathur,, Dhamanpreet Kaur, Rohan Shad, William Hiesinger

TL;DR
SurGen is a novel text-guided diffusion model that generates high-resolution, long-duration surgical videos, enhancing surgical education through realistic and diverse simulation environments.
Contribution
The paper introduces SurGen, the first diffusion-based model specifically designed for surgical video synthesis with improved resolution and duration capabilities.
Findings
SurGen produces the highest resolution surgical videos to date.
Generated videos show high temporal coherence and visual fidelity.
Text alignment accuracy is validated with a deep learning classifier.
Abstract
Diffusion-based video generation models have made significant strides, producing outputs with improved visual fidelity, temporal coherence, and user control. These advancements hold great promise for improving surgical education by enabling more realistic, diverse, and interactive simulation environments. In this study, we introduce SurGen, a text-guided diffusion model tailored for surgical video synthesis. SurGen produces videos with the highest resolution and longest duration among existing surgical video generation models. We validate the visual and temporal quality of the outputs using standard image and video generation metrics. Additionally, we assess their alignment to the corresponding text prompts through a deep learning classifier trained on surgical data. Our results demonstrate the potential of diffusion models to serve as valuable educational tools for surgical trainees.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
