FFA Sora, video generation as fundus fluorescein angiography simulator
Xinyuan Wu, Lili Wang, Ruoyu Chen, Bowen Liu, Weiyi Zhang, Xi Yang,, Yifan Feng, Mingguang He, Danli Shi

TL;DR
FFA Sora is a novel text-to-video model that generates dynamic fundus fluorescein angiography videos from reports, aiding diagnosis and education while preserving patient privacy.
Contribution
The paper introduces FFA Sora, a new model combining Wavelet-Flow VAE and diffusion transformer to generate realistic FFA videos from text reports, addressing privacy and educational needs.
Findings
Achieved objective metrics: FVD=329.78, LPIPS=0.48, VQAScore=0.61.
Generated videos showed acceptable alignment with textual prompts.
Demonstrated strong privacy-preserving retrieval performance.
Abstract
Fundus fluorescein angiography (FFA) is critical for diagnosing retinal vascular diseases, but beginners often struggle with image interpretation. This study develops FFA Sora, a text-to-video model that converts FFA reports into dynamic videos via a Wavelet-Flow Variational Autoencoder (WF-VAE) and a diffusion transformer (DiT). Trained on an anonymized dataset, FFA Sora accurately simulates disease features from the input text, as confirmed by objective metrics: Frechet Video Distance (FVD) = 329.78, Learned Perceptual Image Patch Similarity (LPIPS) = 0.48, and Visual-question-answering Score (VQAScore) = 0.61. Specific evaluations showed acceptable alignment between the generated videos and textual prompts, with BERTScore of 0.35. Additionally, the model demonstrated strong privacy-preserving performance in retrieval evaluations, achieving an average Recall@K of 0.073. Human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis
MethodsDiffusion
