CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization
Nan Chen, Mengqi Huang, Zhuowei Chen, Yang Zheng, Lei Zhang and, Zhendong Mao

TL;DR
CustomContrast introduces a multilevel contrastive learning framework with a multimodal encoder to improve subject-driven text-to-image customization by better decoupling intrinsic subject attributes from irrelevant features.
Contribution
The paper proposes a novel multilevel contrastive learning approach with a multimodal feature injection encoder for enhanced subject customization in text-to-image generation.
Findings
Improves subject similarity in generated images.
Enhances text controllability for subject customization.
Effective in decoupling intrinsic and irrelevant attributes.
Abstract
Subject-driven text-to-image (T2I) customization has drawn significant interest in academia and industry. This task enables pre-trained models to generate novel images based on unique subjects. Existing studies adopt a self-reconstructive perspective, focusing on capturing all details of a single image, which will misconstrue the specific image's irrelevant attributes (e.g., view, pose, and background) as the subject intrinsic attributes. This misconstruction leads to both overfitting or underfitting of irrelevant and intrinsic attributes of the subject, i.e., these attributes are over-represented or under-represented simultaneously, causing a trade-off between similarity and controllability. In this study, we argue an ideal subject representation can be achieved by a cross-differential perspective, i.e., decoupling subject intrinsic attributes from irrelevant attributes via contrastive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Multimedia Communication and Technology · Business Process Modeling and Analysis
MethodsFocus · Contrastive Learning
