Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model
Wei Li, Ming Hu, Guoan Wang, Lihao Liu, Kaijing Zhou, Junzhi Ning, Xin Guo, Zongyuan Ge, Lixu Gu, Junjun He

TL;DR
Ophora is a large-scale, data-driven model capable of generating realistic ophthalmic surgical videos from natural language instructions, addressing data scarcity and privacy issues in medical video analysis.
Contribution
The paper introduces Ophora, a novel model for text-guided ophthalmic surgical video generation, supported by a large dataset and a progressive tuning scheme for improved accuracy.
Findings
Ophora generates realistic surgical videos based on surgeon instructions.
The model's outputs are validated by ophthalmologists and quantitative metrics.
Ophora enhances downstream ophthalmic workflow understanding.
Abstract
In ophthalmic surgery, developing an AI system capable of interpreting surgical videos and predicting subsequent operations requires numerous ophthalmic surgical videos with high-quality annotations, which are difficult to collect due to privacy concerns and labor consumption. Text-guided video generation (T2V) emerges as a promising solution to overcome this issue by generating ophthalmic surgical videos based on surgeon instructions. In this paper, we present Ophora, a pioneering model that can generate ophthalmic surgical videos following natural language instructions. To construct Ophora, we first propose a Comprehensive Data Curation pipeline to convert narrative ophthalmic surgical videos into a large-scale, high-quality dataset comprising over 160K video-instruction pairs, Ophora-160K. Then, we propose a Progressive Video-Instruction Tuning scheme to transfer rich…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Surgical Simulation and Training
