InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Chanran Kim, Jeongin Lee, Shichang Joung, Bongmo Kim, Yeul-Min Baek

TL;DR
InstantFamily introduces a novel masked cross-attention mechanism and multimodal embedding stack for zero-shot multi-ID image generation, effectively preserving identities and enabling controlled, cohesive multi-concept image synthesis.
Contribution
The paper presents a new masked cross-attention approach combined with multimodal embeddings for zero-shot multi-ID image generation, improving identity preservation and compositional control.
Findings
Outperforms existing methods in multi-ID image generation
Achieves state-of-the-art results in identity preservation
Scales effectively with more identities than trained on
Abstract
In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions. Additionally, our masked cross-attention mechanism enables the precise control of multi-ID and composition in the generated images. We demonstrate the effectiveness of InstantFamily through experiments showing its dominance in generating images with multi-ID, while resolving well-known multi-ID…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
