DCoAR: Deep Concept Injection into Unified Autoregressive Models for Personalized Text-to-Image Generation
Fangtai Wu, Mushui Liu, Weijie He, Zhao Wang, Yunlong Yu

TL;DR
DCoAR introduces a deep concept injection framework for personalized text-to-image generation that maintains a frozen pre-trained model and deeply integrates new concepts, improving visual fidelity and re-contextualization with fewer trainable parameters.
Contribution
It proposes a novel Layer-wise Multimodal Context Learning strategy with regularization schemes, enabling effective deep concept injection without model fine-tuning.
Findings
Outperforms previous injection-based methods
Achieves performance comparable to adaptation-based approaches
Requires fewer trainable parameters
Abstract
The unified autoregressive (AR) model excels at multimodal understanding and generation. However, its full potential in the domain of customized image generation has yet to be fully realized. Existing customization approaches for unified AR models face a fundamental dilemma: adaptation-based methods suffer from overfitting and scalability bottlenecks, while concept-injection paradigms are constrained by a shallow injection strategy that leads to poor visual fidelity and impaired re-contextualization. To address this, we propose DCoAR, a novel deep concept injection framework that maintains a completely frozen pre-trained model. DCoAR deeply integrates new concepts through a Layer-wise Multimodal Context Learning (LMCL) strategy, which is stabilized by a multi-faceted regularization scheme: a Dual Prior Preservation (DPP) loss to mitigate semantic drift and a Context-Aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
