DCoAR: Deep Concept Injection into Unified Autoregressive Models for Personalized Text-to-Image Generation

Fangtai Wu; Mushui Liu; Weijie He; Zhao Wang; Yunlong Yu

arXiv:2508.07341·cs.CV·December 9, 2025

DCoAR: Deep Concept Injection into Unified Autoregressive Models for Personalized Text-to-Image Generation

Fangtai Wu, Mushui Liu, Weijie He, Zhao Wang, Yunlong Yu

PDF

Open Access

TL;DR

DCoAR introduces a deep concept injection framework for personalized text-to-image generation that maintains a frozen pre-trained model and deeply integrates new concepts, improving visual fidelity and re-contextualization with fewer trainable parameters.

Contribution

It proposes a novel Layer-wise Multimodal Context Learning strategy with regularization schemes, enabling effective deep concept injection without model fine-tuning.

Findings

01

Outperforms previous injection-based methods

02

Achieves performance comparable to adaptation-based approaches

03

Requires fewer trainable parameters

Abstract

The unified autoregressive (AR) model excels at multimodal understanding and generation. However, its full potential in the domain of customized image generation has yet to be fully realized. Existing customization approaches for unified AR models face a fundamental dilemma: adaptation-based methods suffer from overfitting and scalability bottlenecks, while concept-injection paradigms are constrained by a shallow injection strategy that leads to poor visual fidelity and impaired re-contextualization. To address this, we propose DCoAR, a novel deep concept injection framework that maintains a completely frozen pre-trained model. DCoAR deeply integrates new concepts through a Layer-wise Multimodal Context Learning (LMCL) strategy, which is stabilized by a multi-faceted regularization scheme: a Dual Prior Preservation (DPP) loss to mitigate semantic drift and a Context-Aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications