Endora: Video Generation Models as Endoscopy Simulators
Chenxin Li, Hengyu Liu, Yifan Liu, Brandon Y. Feng, Wuyang Li, Xinyu, Liu, Zhen Chen, Jing Shao, Yixuan Yuan

TL;DR
Endora introduces a novel generative model for creating realistic clinical endoscopy videos, combining a spatial-temporal transformer with vision priors, and establishes a new benchmark for endoscopy simulation.
Contribution
The paper presents the first endoscopy video generation model integrating a spatial-temporal transformer and vision priors, along with a public benchmark for evaluation.
Findings
Outperforms existing methods in visual quality
Enables downstream video analysis tasks
Supports 3D scene generation with multi-view consistency
Abstract
Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for machine learning. Despite progress in generating 2D medical images, the complex domain of clinical video generation has largely remained untapped.This paper introduces \model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes. We present a novel generative model design that integrates a meticulously crafted spatial-temporal video transformer with advanced 2D vision foundation model priors, explicitly modeling spatial-temporal dynamics during video generation. We also pioneer the first public benchmark for endoscopy simulation with video generation models, adapting existing state-of-the-art methods for this endeavor.Endora demonstrates exceptional visual quality in generating endoscopy videos, surpassing state-of-the-art methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAugmented Reality Applications · Surgical Simulation and Training
