One Pass Is Not Enough: Recursive Latent Refinement for Generative Models

Mehdi Esmaeilzadeh; Alexia Jolicoeur-Martineau; Chirag Vashist; Ke Li

arXiv:2605.15309·cs.CV·May 18, 2026

One Pass Is Not Enough: Recursive Latent Refinement for Generative Models

Mehdi Esmaeilzadeh, Alexia Jolicoeur-Martineau, Chirag Vashist, Ke Li

PDF

TL;DR

This paper introduces RTM, a recursive latent refinement method that enhances both quality and diversity in image generation by explicitly improving mode coverage, addressing limitations of traditional metrics like FID.

Contribution

The paper proposes RTM, an iterative refinement process integrated with IMLE, to improve diversity and coverage in generative models, outperforming current state-of-the-art methods.

Findings

01

RTM improves both quality and diversity across multiple datasets.

02

RTM achieves the highest precision and recall among current approaches.

03

RTM enhances StyleGAN2 on various benchmarks.

Abstract

Despite remarkable progress, image generation is far from solved. The dominant metric, FID, conflates sample fidelity with mode coverage and is close to being saturated. Yet a model can still exhibit mode collapse while achieving a low FID, since a handful of sharp, near-duplicate images can outscore a model that faithfully covers the full data distribution. We argue that precision and recall are essential complements to FID, and that because FID is already saturated, the more meaningful goal is to improve diversity and coverage. Achieving high recall requires a model that explicitly prioritizes mode coverage, unlike most generative models, which optimize sample fidelity. We introduce RTM, which replaces the single-pass latent mapping in style-based generators with an iterative refinement process, and show that this consistently improves both quality and diversity. Integrated with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.