MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization
Chenyang Zhu, Hongxiang Li, Xiu Li, Long Chen

TL;DR
MoKus introduces a cross-modal knowledge transfer framework for high-fidelity, knowledge-aware visual concept customization, addressing the limitations of rare token-based methods and enabling versatile applications.
Contribution
The paper proposes MoKus, a novel two-stage framework leveraging cross-modal knowledge transfer for improved concept customization and introduces the first benchmark for this task.
Findings
MoKus outperforms state-of-the-art methods in concept customization.
Cross-modal knowledge transfer enhances generation fidelity.
The framework extends to applications like concept creation and erasure.
Abstract
Concept customization typically binds rare tokens to a target concept. Unfortunately, these approaches often suffer from unstable performance as the pretraining data seldom contains these rare tokens. Meanwhile, these rare tokens fail to convey the inherent knowledge of the target concept. Consequently, we introduce Knowledge-aware Concept Customization, a novel task aiming at binding diverse textual knowledge to target visual concepts. This task requires the model to identify the knowledge within the text prompt to perform high-fidelity customized generation. Meanwhile, the model should efficiently bind all the textual knowledge to the target concept. Therefore, we propose MoKus, a novel framework for knowledge-aware concept customization. Our framework relies on a key observation: cross-modal knowledge transfer, where modifying knowledge within the text modality naturally transfers to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
