PersonaVLM: Long-Term Personalized Multimodal LLMs

Chang Nie; Chaoyou Fu; Yifan Zhang; Haihua Yang; Caifeng Shan

arXiv:2604.13074·cs.CL·April 16, 2026

PersonaVLM: Long-Term Personalized Multimodal LLMs

Chang Nie, Chaoyou Fu, Yifan Zhang, Haihua Yang, Caifeng Shan

PDF

2 Repos 1 Models 2 Datasets

TL;DR

PersonaVLM introduces a long-term personalized multimodal language model framework that remembers, reasons with, and adapts to individual user preferences over time, validated by a new comprehensive benchmark.

Contribution

It presents a novel framework for long-term personalization of multimodal LLMs, integrating memory, reasoning, and response alignment capabilities.

Findings

01

Improves personalization effectiveness by 22.4% on Persona-MME benchmark.

02

Outperforms GPT-4o by 5.2% in long-term personalized responses.

03

Establishes a new benchmark with over 2,000 interaction cases.

Abstract

Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture users' evolving preferences and personality over time (see Fig.1). In this paper, we introduce PersonaVLM, an innovative personalized multimodal agent framework designed for long-term personalization. It transforms a general-purpose MLLM into a personalized assistant by integrating three key capabilities: (a) Remembering: It proactively extracts and summarizes chronological multimodal memories from interactions, consolidating them into a personalized database. (b) Reasoning: It conducts multi-turn reasoning by retrieving and integrating relevant memories from the database.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
ClareNie/PersonaVLM
model· 129 dl· ♡ 16
129 dl♡ 16

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.