Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards
Seungwook Kim, Minsu Cho

TL;DR
This paper introduces SOLACE, a post-training reinforcement learning framework that uses the model's own reconstruction confidence as an intrinsic reward to improve text-to-image generation quality.
Contribution
It presents a novel self-confidence based reward mechanism that eliminates the need for external reward models or annotations in improving generative models.
Findings
Enhances compositional generation, text rendering, and alignment.
Achieves improvements without external reward supervision.
Combines well with external rewards for further gains.
Abstract
Text-to-image generation powers content creation across design, media, and data augmentation. Post-training of text-to-image generative models is a promising path to improve human preference alignment, factuality, and aesthetics. We introduce SOLACE (Self-Originating LAtent Confidence Estimation), a post-training framework that replaces external reward supervision with an internal self-confidence signal: we re-noise the model's own outputs and measure how accurately it recovers the injected noise, treating low reconstruction error as high self-confidence. SOLACE converts this intrinsic signal into scalar rewards for reinforcement learning, requiring no external reward models, annotators, or preference data. By reinforcing high-confidence generations, SOLACE delivers consistent gains in compositional generation, text rendering, and text-image alignment. Integrating SOLACE with external…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
