Self-Creative Text-to-Object Generation using Semantic-Aware Spatial Weighting
Yue Yu, Haibo Chen, Shuo Chen, Jian Yang, Jun Li

TL;DR
The paper introduces a Self-Creative Diffusion model for text-to-image generation that enhances creativity and semantic alignment through spatial weighting and a dual loss function.
Contribution
It proposes a novel framework with a learnable spatial weighting module and a visual-semantic mixing loss to improve creative and meaningful image synthesis.
Findings
Significantly improves creativity and diversity in generated images.
Enhances semantic alignment with textual descriptions.
Produces more visually coherent and surprising images.
Abstract
Instilling creativity in text-to-image (T2I) generation presents a significant challenge, as it requires synthesized images to exhibit not only visual novelty and surprise, but also artistic value. Current T2I models, however, are largely optimized for literal text-image alignment with their data distribution, and their noise prediction networks constrain the generation to high-probability regions, consequently generating outputs that lack authentic creativity. To address this, we propose a Self-Creative Diffusion (SCDiff) model for meaningful T2I generations featuring two core modules: a learnable spatial weighting (LSW) module and a visual-semantic mixing loss (VSML). The LSW module designs a parametric Kaiser-Bessel window to reinforce central image features, fostering novel and surprising generation. The VSML module introduces a dual loss function: a similarity loss constrains that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
