Towards Customized Multimodal Role-Play

Chao Tang; Jianzong Wu; Qingyu Shi; Ye Tian; Aixi Zhang; Hao Jiang; Jiangning Zhang; Yunhai Tong

arXiv:2605.08129·cs.LG·May 12, 2026

Towards Customized Multimodal Role-Play

Chao Tang, Jianzong Wu, Qingyu Shi, Ye Tian, Aixi Zhang, Hao Jiang, Jiangning Zhang, Yunhai Tong

PDF

1 Repo

TL;DR

This paper introduces a new task, Customized Multimodal Role-Play, and a unified model that enables personalized, consistent human-AI interactions across text and images using minimal data.

Contribution

It proposes the CMRP task, constructs the RoleScape-20 dataset, and develops UniCharacter, a two-stage training framework for few-shot multimodal character customization.

Findings

01

The method outperforms prior approaches on RoleScape-20.

02

Coherent persona, style, and visual identity are achieved with only 10 images.

03

Cross-modal consistency and few-shot strategies are validated through ablation studies.

Abstract

Unified multimodal understanding and generation models enable richer human-AI interaction. Yet jointly customizing a character's persona, dialogue style, and visual identity while maintaining output consistency across modalities remains largely unexplored. To mitigate this gap, we introduce a new task, Customized Multimodal Role-Play (CMRP). We construct the RoleScape-20 dataset comprising 20 characters, including training and evaluation data that cover persona, stylistic descriptions, visual/expressive cues, and text-image interactions. Building on a unified model, we devise UniCharacter, a two-stage training framework containing Unified Supervised Finetuning (Unified-SFT) and character-specific group relative policy optimization (Character-GRPO). Given only 10 images plus corresponding interaction examples, the model acquires the target character and exhibits coherent persona, style,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tangc03/UniCharacter
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.