Images Speak Louder Than Scores: Failure Mode Escape for Enhancing Generative Quality

Jie Shao; Ke Zhu; Minghao Fu; Guo-hua Wang; Jianxin Wu

arXiv:2508.09598·cs.CV·August 14, 2025

Images Speak Louder Than Scores: Failure Mode Escape for Enhancing Generative Quality

Jie Shao, Ke Zhu, Minghao Fu, Guo-hua Wang, Jianxin Wu

PDF

TL;DR

This paper introduces FaME, a training-free method that uses image quality assessment to identify and steer away from low-quality generations, improving perceptual quality in diffusion models without affecting FID scores.

Contribution

FaME is a novel, training-free approach that enhances perceptual quality by leveraging failure mode detection and negative guidance during sampling.

Findings

01

FaME improves visual quality of generated images on ImageNet.

02

FaME maintains FID scores while enhancing perceptual quality.

03

Potential extension to text-to-image generation demonstrated.

Abstract

Diffusion models have achieved remarkable progress in class-to-image generation. However, we observe that despite impressive FID scores, state-of-the-art models often generate distorted or low-quality images, especially in certain classes. This gap arises because FID evaluates global distribution alignment, while ignoring the perceptual quality of individual samples. We further examine the role of CFG, a common technique used to enhance generation quality. While effective in improving metrics and suppressing outliers, CFG can introduce distribution shift and visual artifacts due to its misalignment with both training objectives and user expectations. In this work, we propose FaME, a training-free and inference-efficient method for improving perceptual quality. FaME uses an image quality assessment model to identify low-quality generations and stores their sampling trajectories. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.