Saliency Guided Optimization of Diffusion Latents
Xiwen Wang, Jizhe Zhou, Xuekang Zhu, Cheng Li, Mao Li

TL;DR
This paper introduces SGOOL, a saliency-guided optimization method for diffusion models that enhances image quality and prompt alignment by focusing on salient regions, mimicking human visual attention, and enabling efficient, parameter-free fine-tuning.
Contribution
SGOOL is a novel, saliency-guided optimization approach that directly optimizes diffusion latents, improving alignment and quality without retraining additional models.
Findings
SGOOL outperforms existing methods in image quality.
It achieves better prompt alignment according to metrics and human evaluation.
The method is parameter-efficient and memory-friendly.
Abstract
With the rapid advances in diffusion models, generating decent images from text prompts is no longer challenging. The key to text-to-image generation is how to optimize the results of a text-to-image generation model so that they can be better aligned with human intentions or prompts. Existing optimization methods commonly treat the entire image uniformly and conduct global optimization. These methods overlook the fact that when viewing an image, the human visual system naturally prioritizes attention toward salient areas, often neglecting less or non-salient regions. That is, humans are likely to neglect optimizations in non-salient areas. Consequently, although model retaining is conducted under the guidance of additional large and multimodality models, existing methods, which perform uniform optimizations, yield sub-optimal results. To address this alignment challenge effectively and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Diffusion
