Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation
Kangyeol Kim, Wooseok Seo, Sehyun Nam, Bodam Kim, Suhyeon Jeong,, Wonwoo Cho, Jaegul Choo, Youngjae Yu

TL;DR
The paper introduces Layout-and-Retouch, a dual-stage framework for personalized image generation that enhances diversity and identity preservation by combining diversified layout synthesis with feature retouching.
Contribution
A novel two-stage method that improves diversity and prompt fidelity in personalized text-to-image generation through layout generation and multi-source attention swapping.
Findings
Generates diverse images with high identity preservation.
Effectively balances prompt fidelity and diversity.
Handles complex text prompts successfully.
Abstract
Personalized text-to-image (P-T2I) generation aims to create new, text-guided images featuring the personalized subject with a few reference images. However, balancing the trade-off relationship between prompt fidelity and identity preservation remains a critical challenge. To address the issue, we propose a novel P-T2I method called Layout-and-Retouch, consisting of two stages: 1) layout generation and 2) retouch. In the first stage, our step-blended inference utilizes the inherent sample diversity of vanilla T2I models to produce diversified layout images, while also enhancing prompt fidelity. In the second stage, multi-source attention swapping integrates the context image from the first stage with the reference image, leveraging the structure from the context image and extracting visual features from the reference image. This achieves high prompt fidelity while preserving identity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics
MethodsSoftmax · Attention Is All You Need
