Self-Corrected Image Generation with Explainable Latent Rewards

Yinyi Luo; Hrishikesh Gokhale; Marios Savvides; Jindong Wang; Shengfeng He

arXiv:2603.24965·cs.CV·March 27, 2026

Self-Corrected Image Generation with Explainable Latent Rewards

Yinyi Luo, Hrishikesh Gokhale, Marios Savvides, Jindong Wang, Shengfeng He

PDF

Open Access

TL;DR

This paper introduces xLARD, a self-correcting image generation framework that leverages explainable latent rewards and multimodal language models to improve alignment and fidelity in text-to-image synthesis.

Contribution

It proposes a novel self-correcting method using explainable latent rewards and a lightweight corrector to enhance image generation quality.

Findings

01

Improves semantic alignment in generated images

02

Enhances visual fidelity while preserving generative priors

03

Demonstrates effectiveness across diverse tasks

Abstract

Despite significant progress in text-to-image generation, aligning outputs with complex prompts remains challenging, particularly for fine-grained semantics and spatial relations. This difficulty stems from the feed-forward nature of generation, which requires anticipating alignment without fully understanding the output. In contrast, evaluating generated images is more tractable. Motivated by this asymmetry, we propose xLARD, a self-correcting framework that uses multimodal large language models to guide generation through Explainable LAtent RewarDs. xLARD introduces a lightweight corrector that refines latent representations based on structured feedback from model-generated references. A key component is a differentiable mapping from latent edits to interpretable reward signals, enabling continuous latent-level guidance from non-differentiable image-level evaluations. This mechanism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling