TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant
Rongpei Hong, Jian Lang, Ting Zhong, Yong Wang, Fan Zhou

TL;DR
This paper introduces LCMP, a new benchmark for long-context personalization in multimodal large language models, and proposes TAME, a training-free, state-aware framework that enhances MLLMs' ability to handle long dialogues and personalized concepts.
Contribution
The paper presents the first long-context personalization benchmark LCMP and a novel training-free framework TAME that improves MLLMs' long-term personalized dialogue capabilities.
Findings
TAME outperforms existing methods on LCMP benchmark.
TAME enables better handling of long-context personalized conversations.
Experiments show TAME provides more consistent and evolving interaction experiences.
Abstract
Multimodal Large Language Model (MLLM) Personalization is a critical research problem that facilitates personalized dialogues with MLLMs targeting specific entities (known as personalized concepts). However, existing methods and benchmarks focus on the simple, context-agnostic visual identification and textual replacement of the personalized concept (e.g., "A yellow puppy" -> "Your puppy Mochi"), overlooking the ability to support long-context conversations. An ideal personalized MLLM assistant is capable of engaging in long-context dialogues with humans and continually improving its experience quality by learning from past dialogue histories. To bridge this gap, we propose LCMP, the first Long-Context MLLM Personalization evaluation benchmark. LCMP assesses the capability of MLLMs in perceiving variations of personalized concepts and generating contextually appropriate personalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems
