MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization

Chenyang Zhu; Hongxiang Li; Xiu Li; Long Chen

arXiv:2603.12743·cs.CV·March 16, 2026

MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization

Chenyang Zhu, Hongxiang Li, Xiu Li, Long Chen

PDF

Open Access 1 Datasets

TL;DR

MoKus introduces a cross-modal knowledge transfer framework for high-fidelity, knowledge-aware visual concept customization, addressing the limitations of rare token-based methods and enabling versatile applications.

Contribution

The paper proposes MoKus, a novel two-stage framework leveraging cross-modal knowledge transfer for improved concept customization and introduces the first benchmark for this task.

Findings

01

MoKus outperforms state-of-the-art methods in concept customization.

02

Cross-modal knowledge transfer enhances generation fidelity.

03

The framework extends to applications like concept creation and erasure.

Abstract

Concept customization typically binds rare tokens to a target concept. Unfortunately, these approaches often suffer from unstable performance as the pretraining data seldom contains these rare tokens. Meanwhile, these rare tokens fail to convey the inherent knowledge of the target concept. Consequently, we introduce Knowledge-aware Concept Customization, a novel task aiming at binding diverse textual knowledge to target visual concepts. This task requires the model to identify the knowledge within the text prompt to perform high-fidelity customized generation. Meanwhile, the model should efficiently bind all the textual knowledge to the target concept. Therefore, we propose MoKus, a novel framework for knowledge-aware concept customization. Our framework relies on a key observation: cross-modal knowledge transfer, where modifying knowledge within the text modality naturally transfers to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

zcaoyao/KnowCusBench
dataset· 245 dl
245 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques