AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation
Junjie He, Yuxiang Tuo, Binghui Chen, Chongyang Zhong, Yifeng Geng,, Liefeng Bo

TL;DR
AnyStory introduces a unified method for high-fidelity personalization in text-to-image generation, effectively handling both single and multiple subjects without compromising detail or accuracy.
Contribution
The paper presents a novel encode-then-route framework utilizing ReferenceNet and CLIP for personalized subject encoding and a decoupled router for precise subject placement in generated images.
Findings
Achieves high-fidelity personalization for single subjects.
Effectively handles multiple subjects without loss of detail.
Demonstrates superior alignment with text descriptions.
Abstract
Recently, large-scale generative models have demonstrated outstanding text-to-image generation capabilities. However, generating high-fidelity personalized images with specific subjects still presents challenges, especially in cases involving multiple subjects. In this paper, we propose AnyStory, a unified approach for personalized subject generation. AnyStory not only achieves high-fidelity personalization for single subjects, but also for multiple subjects, without sacrificing subject fidelity. Specifically, AnyStory models the subject personalization problem in an "encode-then-route" manner. In the encoding step, AnyStory utilizes a universal and powerful image encoder, i.e., ReferenceNet, in conjunction with CLIP vision encoder to achieve high-fidelity encoding of subject features. In the routing step, AnyStory utilizes a decoupled instance-aware subject router to accurately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Digital Humanities and Scholarship · Topic Modeling
MethodsContrastive Language-Image Pre-training
