Interactive Generation of Laparoscopic Videos with Diffusion Models

Ivan Iliash (1); Simeon Allmendinger (2); Felix Meissen (1); Niklas; K\"uhl (2); Daniel R\"uckert (1) ((1) Technical University of Munich; (2); University of Bayreuth)

arXiv:2406.06537·eess.IV·June 12, 2024

Interactive Generation of Laparoscopic Videos with Diffusion Models

Ivan Iliash (1), Simeon Allmendinger (2), Felix Meissen (1), Niklas, K\"uhl (2), Daniel R\"uckert (1) ((1) Technical University of Munich, (2), University of Bayreuth)

PDF

Open Access

TL;DR

This paper introduces a novel method using diffusion models to generate realistic laparoscopic videos interactively, enhancing surgical training with photorealistic synthetic data guided by text and segmentation masks.

Contribution

It presents a zero-shot video diffusion approach for surgical video generation, combining text and spatial guidance to improve realism and control in synthetic laparoscopic videos.

Findings

01

Achieved an FID of 38.097 indicating high visual fidelity.

02

F1-score of 0.71 demonstrating effective spatial control of tools.

03

Validated the approach using the Cholec dataset and surgical action recognition.

Abstract

Generative AI, in general, and synthetic visual data generation, in specific, hold much promise for benefiting surgical training by providing photorealism to simulation environments. Current training methods primarily rely on reading materials and observing live surgeries, which can be time-consuming and impractical. In this work, we take a significant step towards improving the training process. Specifically, we use diffusion models in combination with a zero-shot video diffusion method to interactively generate realistic laparoscopic images and videos by specifying a surgical action through text and guiding the generation with tool positions through segmentation masks. We demonstrate the performance of our approach using the publicly available Cholec dataset family and evaluate the fidelity and factual correctness of our generated images using a surgical action recognition model as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · 3D Modeling in Geospatial Applications

MethodsDiffusion