SurGen: Text-Guided Diffusion Model for Surgical Video Generation

Joseph Cho; Samuel Schmidgall; Cyril Zakka; Mrudang Mathur,; Dhamanpreet Kaur; Rohan Shad; William Hiesinger

arXiv:2408.14028·cs.CV·September 26, 2024·3 cites

SurGen: Text-Guided Diffusion Model for Surgical Video Generation

Joseph Cho, Samuel Schmidgall, Cyril Zakka, Mrudang Mathur,, Dhamanpreet Kaur, Rohan Shad, William Hiesinger

PDF

Open Access

TL;DR

SurGen is a novel text-guided diffusion model that generates high-resolution, long-duration surgical videos, enhancing surgical education through realistic and diverse simulation environments.

Contribution

The paper introduces SurGen, the first diffusion-based model specifically designed for surgical video synthesis with improved resolution and duration capabilities.

Findings

01

SurGen produces the highest resolution surgical videos to date.

02

Generated videos show high temporal coherence and visual fidelity.

03

Text alignment accuracy is validated with a deep learning classifier.

Abstract

Diffusion-based video generation models have made significant strides, producing outputs with improved visual fidelity, temporal coherence, and user control. These advancements hold great promise for improving surgical education by enabling more realistic, diverse, and interactive simulation environments. In this study, we introduce SurGen, a text-guided diffusion model tailored for surgical video synthesis. SurGen produces videos with the highest resolution and longest duration among existing surgical video generation models. We validate the visual and temporal quality of the outputs using standard image and video generation metrics. Additionally, we assess their alignment to the corresponding text prompts through a deep learning classifier trained on surgical data. Our results demonstrate the potential of diffusion models to serve as valuable educational tools for surgical trainees.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion