Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention
Wonwoong Cho, Yanxia Zhang, Yan-Ying Chen, David I. Inouye

TL;DR
This paper introduces IT-Blender, a diffusion-based model that effectively blends real images and text to enhance human creativity, overcoming limitations of previous methods in detail preservation and disentanglement.
Contribution
IT-Blender is a novel diffusion model adapter that preserves image details and disentangles visual and textual inputs for improved conceptual blending.
Findings
IT-Blender outperforms baseline methods significantly in blending quality.
The model effectively preserves details of real images during blending.
It demonstrates potential to augment human creativity in visual design.
Abstract
Blending visual and textual concepts into a new visual concept is a unique and powerful trait of human beings that can fuel creativity. However, in practice, cross-modal conceptual blending for humans is prone to cognitive biases, like design fixation, which leads to local minima in the design space. In this paper, we propose a T2I diffusion adapter "IT-Blender" that can automate the blending process to enhance human creativity. Prior works related to cross-modal conceptual blending are limited in encoding a real image without loss of details or in disentangling the image and text inputs. To address these gaps, IT-Blender leverages pretrained diffusion models (SD and FLUX) to blend the latent representations of a clean reference image with those of the noisy generated image. Combined with our novel blended attention, IT-Blender encodes the real reference image without loss of details…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAesthetic Perception and Analysis · Visual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis
