A Pragmatic Note on Evaluating Generative Models with Fr\'echet Inception Distance for Retinal Image Synthesis

Yuli Wu; Fucheng Liu; R\"uveyda Yilmaz; Henning Konermann; Peter Walter; Johannes Stegmaier

arXiv:2502.17160·cs.CV·February 23, 2026

A Pragmatic Note on Evaluating Generative Models with Fr\'echet Inception Distance for Retinal Image Synthesis

Yuli Wu, Fucheng Liu, R\"uveyda Yilmaz, Henning Konermann, Peter Walter, Johannes Stegmaier

PDF

Open Access

TL;DR

This paper critically examines the use of FID, a popular generative model evaluation metric, in retinal imaging, highlighting its limitations and advocating for task-specific assessments in biomedical image synthesis.

Contribution

The paper reveals the limitations of FID in biomedical retinal image synthesis and emphasizes the importance of downstream task evaluation for more meaningful assessment.

Findings

01

FID can misalign with task-specific performance in retinal image synthesis.

02

Metrics like FID may not reliably evaluate synthetic biomedical images for downstream tasks.

03

Task-based evaluation provides more relevant insights than FID alone.

Abstract

Fr\'echet Inception Distance (FID), computed with an ImageNet pretrained Inception-v3 network, is widely used as a state-of-the-art evaluation metric for generative models. It assumes that feature vectors from Inception-v3 follow a multivariate Gaussian distribution and calculates the 2-Wasserstein distance based on their means and covariances. While FID effectively measures how closely synthetic data match real data in many image synthesis tasks, the primary goal in biomedical generative models is often to enrich training datasets ideally with corresponding annotations. For this purpose, the gold standard for evaluating generative models is to incorporate synthetic data into downstream task training, such as classification and segmentation, to pragmatically assess its performance. In this paper, we examine cases from retinal imaging modalities, including color fundus photography and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis · Visual perception and processing mechanisms · Visual Attention and Saliency Detection

MethodsDense Connections · Average Pooling · Label Smoothing · Max Pooling · Auxiliary Classifier · Softmax · Dropout · 1x1 Convolution · Convolution · Inception-v3 Module