Aligned Datasets Improve Detection of Latent Diffusion-Generated Images
Anirudh Sundara Rajan, Utkarsh Ojha, Jedidiah Schloesser, Yong Jae, Lee

TL;DR
This paper introduces a simple method to improve fake image detection by creating aligned real/fake datasets through LDM autoencoder reconstructions, leading to more robust detectors that focus on model artifacts.
Contribution
It proposes a novel dataset alignment technique using LDM autoencoder reconstructions, enhancing detector robustness against spurious correlations.
Findings
Aligned datasets improve detection accuracy.
Reconstruction-based fake images focus detectors on model artifacts.
Method reduces reliance on computationally expensive denoising.
Abstract
As latent diffusion models (LDMs) democratize image generation capabilities, there is a growing need to detect fake images. A good detector should focus on the generative models fingerprints while ignoring image properties such as semantic content, resolution, file format, etc. Fake image detectors are usually built in a data driven way, where a model is trained to separate real from fake images. Existing works primarily investigate network architecture choices and training recipes. In this work, we argue that in addition to these algorithmic choices, we also require a well aligned dataset of real/fake images to train a robust detector. For the family of LDMs, we propose a very simple way to achieve this: we reconstruct all the real images using the LDMs autoencoder, without any denoising operation. We then train a model to separate these real images from their reconstructions. The…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper proposes a simple and efficient way to collect data for fake image detection. 2. The paper provides an analysis of common pitfalls of prior works, such as image resolution or compression artifacts.
1. It is unclear whether or not the proposed method would generalize well to architectures other than latent diffusion models. For example, would this generalize well to models such as VQ-VAE[1], VQ-GAN[2], or more recent models with very different autoencoder architectures? The paper mentions that it does not work well on models with vastly different architectures (e.g., FLUX.1-dev), but it is not clear to what extent of architectural changes the proposed method is robust to. 2. The proposed me
1. The paper is well-written and has a very clear research motivation. 2. The dataset alignment is a critical problem that might cause the model to learn spurious correlation in the fake image detection field. 3. The authors provide reasonable evidence to validate their findings and observations. Especially the effectiveness of downsampled before and after resizing (Table 5), which is insightful to me.
- A major limitation of this work is the generality of images generated by Latent Diffusion Models (LDM). Does the detector primarily learn LDM-specific fake patterns, rather than a broader range of patterns such as those generated by GANs? Or does it learn only method-specific fake patterns unique to a particular LDM instance, failing to generalize across the LDM family? A more robust evaluation of this generality would be beneficial. - This paper defines "fake image detection" as "entire imag
1. The paper proposes an interesting framework for generating fake images by using only the LDM's autoencoder, rather than the full generative process, to train the detector. Experiments show this approach achieves better performance compared to using complete generated images, which is a noteworthy finding. 2. The paper reveals the impact of scaling factors on detection methods, contributing valuable insights to the field.
1. Lack of technical contribution: Although the findings are interesting, the authors did not deeply explore the reasons behind the observed phenomena or how they could further benefit existing detection methods. They only conducted shallow comparative experiments using their generated data. Overall, the contribution feels insufficient. 2. Limited comparison methods: The authors aim to prove their dataset has better generalization and performance than others, but they only compared it with two
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Misinformation and Its Impacts
MethodsFocus · Diffusion
