Unified Personalized Understanding, Generating and Editing

Yu Zhong; Tianwei Lin; Ruike Zhu; Yuqian Yuan; Haoyu Zheng; Liang Liang; Wenqiao Zhang; Feifei Shao; Haoyuan Li; Wanggui He; Hao Jiang; Yueting Zhuang

arXiv:2601.06965·cs.CV·January 13, 2026

Unified Personalized Understanding, Generating and Editing

Yu Zhong, Tianwei Lin, Ruike Zhu, Yuqian Yuan, Haoyu Zheng, Liang Liang, Wenqiao Zhang, Feifei Shao, Haoyuan Li, Wanggui He, Hao Jiang, Yueting Zhuang

PDF

Open Access

TL;DR

OmniPersona is an end-to-end framework for personalized multimodal understanding, generation, and editing, addressing limitations of previous methods by decoupling tasks and propagating personalized knowledge.

Contribution

It introduces structurally decoupled concept tokens and a knowledge replay mechanism for unified personalized multimodal tasks, a novel approach in the field.

Findings

01

Achieves robust performance across diverse personalization tasks

02

Effectively propagates personalized attribute knowledge across tasks

03

Sets a new baseline for unified personalized multimodal models

Abstract

Unified large multimodal models (LMMs) have achieved remarkable progress in general-purpose multimodal understanding and generation. However, they still operate under a ``one-size-fits-all'' paradigm and struggle to model user-specific concepts (e.g., generate a photo of \texttt{<maeve>}) in a consistent and controllable manner. Existing personalization methods typically rely on external retrieval, which is inefficient and poorly integrated into unified multimodal pipelines. Recent personalized unified models introduce learnable soft prompts to encode concept information, yet they either couple understanding and generation or depend on complex multi-stage training, leading to cross-task interference and ultimately to fuzzy or misaligned personalized knowledge. We present \textbf{OmniPersona}, an end-to-end personalization framework for unified LMMs that, for the first time, integrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques