Benchmarking Counterfactual Image Generation
Thomas Melistas, Nikos Spyrou, Nefeli Gkouti, Pedro Sanchez,, Athanasios Vlontzos, Yannis Panagakis, Giorgos Papanastasiou, Sotirios A., Tsaftaris

TL;DR
This paper introduces a comprehensive benchmarking framework for counterfactual image generation, comparing various models across datasets and causal graphs, and providing a user-friendly Python package for future research.
Contribution
It presents a unified benchmarking framework and extends existing models to new datasets and causal graphs, highlighting Hierarchical VAEs as particularly effective.
Findings
Hierarchical VAEs outperform other models on most datasets and metrics.
The framework enables fair comparison of counterfactual image generation methods.
The Python package facilitates community extension and evaluation.
Abstract
Generative AI has revolutionised visual content editing, empowering users to effortlessly modify images and videos. However, not all edits are equal. To perform realistic edits in domains such as natural image or medical imaging, modifications must respect causal relationships inherent to the data generation process. Such image editing falls into the counterfactual image generation regime. Evaluating counterfactual image generation is substantially complex: not only it lacks observable ground truths, but also requires adherence to causal constraints. Although several counterfactual image generation methods and evaluation metrics exist, a comprehensive comparison within a unified setting is lacking. We present a comparison framework to thoroughly benchmark counterfactual image generation methods. We integrate all models that have been used for the task at hand and expand them to novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques
MethodsFocus
