RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
Mengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhang

TL;DR
RealCustom introduces a novel framework for real-time text-to-image customization that disentangles subject similarity from text controllability, enabling precise and efficient synthesis of images matching specific subjects and descriptions.
Contribution
It proposes a new 'train-inference' decoupled framework with adaptive scoring and mask guidance strategies to improve real-time customization in open-domain text-to-image synthesis.
Findings
Achieves superior similarity and controllability in generated images.
Demonstrates real-time performance in open-domain customization.
Outperforms existing methods in subject relevance and text control balance.
Abstract
Text-to-image customization, which aims to synthesize text-driven images for the given subjects, has recently revolutionized content creation. Existing works follow the pseudo-word paradigm, i.e., represent the given subjects as pseudo-words and then compose them with the given text. However, the inherent entangled influence scope of pseudo-words with the given text results in a dual-optimum paradox, i.e., the similarity of the given subjects and the controllability of the given text could not be optimal simultaneously. We present RealCustom that, for the first time, disentangles similarity from controllability by precisely limiting subject influence to relevant parts only, achieved by gradually narrowing real text word from its general connotation to the specific subject and using its cross-attention to distinguish relevance. Specifically, RealCustom introduces a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimedia Communication and Technology · Computer Graphics and Visualization Techniques
