TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant

Rongpei Hong; Jian Lang; Ting Zhong; Yong Wang; Fan Zhou

arXiv:2512.21616·cs.CV·December 29, 2025

TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant

Rongpei Hong, Jian Lang, Ting Zhong, Yong Wang, Fan Zhou

PDF

Open Access

TL;DR

This paper introduces LCMP, a new benchmark for long-context personalization in multimodal large language models, and proposes TAME, a training-free, state-aware framework that enhances MLLMs' ability to handle long dialogues and personalized concepts.

Contribution

The paper presents the first long-context personalization benchmark LCMP and a novel training-free framework TAME that improves MLLMs' long-term personalized dialogue capabilities.

Findings

01

TAME outperforms existing methods on LCMP benchmark.

02

TAME enables better handling of long-context personalized conversations.

03

Experiments show TAME provides more consistent and evolving interaction experiences.

Abstract

Multimodal Large Language Model (MLLM) Personalization is a critical research problem that facilitates personalized dialogues with MLLMs targeting specific entities (known as personalized concepts). However, existing methods and benchmarks focus on the simple, context-agnostic visual identification and textual replacement of the personalized concept (e.g., "A yellow puppy" -> "Your puppy Mochi"), overlooking the ability to support long-context conversations. An ideal personalized MLLM assistant is capable of engaging in long-context dialogues with humans and continually improving its experience quality by learning from past dialogue histories. To bridge this gap, we propose LCMP, the first Long-Context MLLM Personalization evaluation benchmark. LCMP assesses the capability of MLLMs in perceiving variations of personalized concepts and generating contextually appropriate personalized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems