TL;DR
ConceptPrism is a novel framework for disentangling target concepts from residual information in personalized diffusion models, improving fidelity and scalability in text-to-image generation.
Contribution
It introduces a residual token optimization method that enhances concept disentanglement without external guidance or manual intervention.
Findings
Achieves accurate concept disentanglement in diverse visual concepts.
Significantly improves performance over existing methods.
Demonstrates effectiveness in complex visual concept scenarios.
Abstract
Personalized text-to-image (T2I) generation has emerged as a key application for creating user-specific concepts from a few reference images. The core challenge is concept disentanglement: separating the target concept from irrelevant residual information. Lacking such disentanglement, capturing high-fidelity features often incorporates undesired attributes that conflict with user prompts, compromising the trade-off between concept fidelity and text alignment. While existing methods rely on manual guidance, they often fail to represent intricate visual details and lack scalability. We introduce ConceptPrism, a framework that extracts shared features exclusively through cross-image comparison without external information. We jointly optimize a target token and image-wise residual tokens via reconstruction and exclusion losses. By suppressing shared information in residual tokens, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
