Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards

Seungwook Kim; Minsu Cho

arXiv:2603.00918·cs.CV·May 12, 2026

Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards

Seungwook Kim, Minsu Cho

PDF

TL;DR

This paper introduces SOLACE, a post-training reinforcement learning framework that uses the model's own reconstruction confidence as an intrinsic reward to improve text-to-image generation quality.

Contribution

It presents a novel self-confidence based reward mechanism that eliminates the need for external reward models or annotations in improving generative models.

Findings

01

Enhances compositional generation, text rendering, and alignment.

02

Achieves improvements without external reward supervision.

03

Combines well with external rewards for further gains.

Abstract

Text-to-image generation powers content creation across design, media, and data augmentation. Post-training of text-to-image generative models is a promising path to improve human preference alignment, factuality, and aesthetics. We introduce SOLACE (Self-Originating LAtent Confidence Estimation), a post-training framework that replaces external reward supervision with an internal self-confidence signal: we re-noise the model's own outputs and measure how accurately it recovers the injected noise, treating low reconstruction error as high self-confidence. SOLACE converts this intrinsic signal into scalar rewards for reinforcement learning, requiring no external reward models, annotators, or preference data. By reinforcing high-confidence generations, SOLACE delivers consistent gains in compositional generation, text rendering, and text-image alignment. Integrating SOLACE with external…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.