FFA Sora, video generation as fundus fluorescein angiography simulator

Xinyuan Wu; Lili Wang; Ruoyu Chen; Bowen Liu; Weiyi Zhang; Xi Yang,; Yifan Feng; Mingguang He; Danli Shi

arXiv:2412.17346·cs.CV·December 24, 2024

FFA Sora, video generation as fundus fluorescein angiography simulator

Xinyuan Wu, Lili Wang, Ruoyu Chen, Bowen Liu, Weiyi Zhang, Xi Yang,, Yifan Feng, Mingguang He, Danli Shi

PDF

Open Access

TL;DR

FFA Sora is a novel text-to-video model that generates dynamic fundus fluorescein angiography videos from reports, aiding diagnosis and education while preserving patient privacy.

Contribution

The paper introduces FFA Sora, a new model combining Wavelet-Flow VAE and diffusion transformer to generate realistic FFA videos from text reports, addressing privacy and educational needs.

Findings

01

Achieved objective metrics: FVD=329.78, LPIPS=0.48, VQAScore=0.61.

02

Generated videos showed acceptable alignment with textual prompts.

03

Demonstrated strong privacy-preserving retrieval performance.

Abstract

Fundus fluorescein angiography (FFA) is critical for diagnosing retinal vascular diseases, but beginners often struggle with image interpretation. This study develops FFA Sora, a text-to-video model that converts FFA reports into dynamic videos via a Wavelet-Flow Variational Autoencoder (WF-VAE) and a diffusion transformer (DiT). Trained on an anonymized dataset, FFA Sora accurately simulates disease features from the input text, as confirmed by objective metrics: Frechet Video Distance (FVD) = 329.78, Learned Perceptual Image Patch Similarity (LPIPS) = 0.48, and Visual-question-answering Score (VQAScore) = 0.61. Specific evaluations showed acceptable alignment between the generated videos and textual prompts, with BERTScore of 0.35. Additionally, the model demonstrated strong privacy-preserving performance in retrieval evaluations, achieving an average Recall@K of 0.073. Human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis

MethodsDiffusion