Establishing an Evaluation Metric to Quantify Climate Change Image Realism
Sharon Zhou, Alexandra Luccioni, Gautier Cosne, Michael S. Bernstein,, Yoshua Bengio

TL;DR
This paper proposes and assesses automated and human evaluation methods for climate change-related images generated by conditional models, finding that FID with Inception-V3 embeddings correlates well with human judgment.
Contribution
It introduces adapted evaluation metrics for climate change image realism and compares their effectiveness against human assessments.
Findings
FID with Inception-V3 embeddings correlates best with human judgment
Automated metrics can partially bridge the gap with human evaluation
The study advances evaluation methods for climate change visualizations
Abstract
With success on controlled tasks, generative models are being increasingly applied to humanitarian applications [1,2]. In this paper, we focus on the evaluation of a conditional generative model that illustrates the consequences of climate change-induced flooding to encourage public interest and awareness on the issue. Because metrics for comparing the realism of different modes in a conditional generative model do not exist, we propose several automated and human-based methods for evaluation. To do this, we adapt several existing metrics, and assess the automated metrics against gold standard human evaluation. We find that using Fr\'echet Inception Distance (FID) with embeddings from an intermediary Inception-V3 layer that precedes the auxiliary classifier produces results most correlated with human realism. While insufficient alone to establish a human-correlated automatic evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Flood Risk Assessment and Management · Anomaly Detection Techniques and Applications
MethodsAverage Pooling · 1x1 Convolution · RMSProp · Inception-v3 Module · Max Pooling · Softmax · Convolution · Dropout · Dense Connections · Label Smoothing
