EndoGen: Conditional Autoregressive Endoscopic Video Generation
Xinyu Liu, Hengyu Liu, Cheng Wang, Tianming Liu, Yixuan Yuan

TL;DR
EndoGen introduces a novel conditional autoregressive framework for endoscopic video generation, utilizing a grid-based patterning and semantic-aware masking to produce high-quality, diverse videos that aid medical diagnosis.
Contribution
This paper presents the first conditional endoscopic video generation model with a new SGP strategy and SAT mechanism, advancing dynamic medical imaging synthesis.
Findings
Generated videos are high-quality and conditionally guided.
Improves downstream polyp segmentation performance.
Effective in modeling complex spatiotemporal dependencies.
Abstract
Endoscopic video generation is crucial for advancing medical imaging and enhancing diagnostic capabilities. However, prior efforts in this field have either focused on static images, lacking the dynamic context required for practical applications, or have relied on unconditional generation that fails to provide meaningful references for clinicians. Therefore, in this paper, we propose the first conditional endoscopic video generation framework, namely EndoGen. Specifically, we build an autoregressive model with a tailored Spatiotemporal Grid-Frame Patterning (SGP) strategy. It reformulates the learning of generating multiple frames as a grid-based image generation pattern, which effectively capitalizes the inherent global dependency modeling capabilities of autoregressive architectures. Furthermore, we propose a Semantic-Aware Token Masking (SAT) mechanism, which enhances the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGastrointestinal Bleeding Diagnosis and Treatment
