Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
Sangwon Jang, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang

TL;DR
MuDI is a new framework that effectively decouples multiple subjects' identities in text-to-image models, enabling high-quality multi-subject personalization without mixing identities, even for similar subjects.
Contribution
Introduces MuDI, a novel identity decoupling method using segmentation for improved multi-subject personalization in text-to-image models.
Findings
MuDI achieves twice the success rate in avoiding identity mixing compared to baselines.
MuDI is preferred over 70% of the time against the strongest baseline.
Experimental results demonstrate high-quality, multi-subject personalized images.
Abstract
Text-to-image diffusion models have shown remarkable success in generating personalized subjects based on a few reference images. However, current methods often fail when generating multiple subjects simultaneously, resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effectively decoupling identities from multiple subjects. Our main idea is to utilize segmented subjects generated by a foundation model for segmentation (Segment Anything) for both training and inference, as a form of data augmentation for training and initialization for the generation process. Moreover, we further introduce a new metric to better evaluate the performance of our method on multi-subject personalization. Experimental results show that our MuDI can produce high-quality personalized images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Topic Modeling
MethodsDiffusion
