PEARL: Personalized Streaming Video Understanding Model
Yuanhong Zheng, Ruichuan An, Xiaopeng Lin, Yuxing Liu, Sihan Yang, Huanyu Zhang, Haodong Li, Qintong Zhang, Renrui Zhang, Guopeng Li, Yifan Zhang, Yuheng Li, Wentao Zhang

TL;DR
This paper introduces a new task of Personalized Streaming Video Understanding (PSVU), a benchmark called PEARL-Bench, and a training-free baseline model PEARL, to enable real-time, personalized video comprehension for AI assistants.
Contribution
The paper defines the novel PSVU task, creates PEARL-Bench for evaluation, and proposes PEARL as a robust, training-free baseline to advance personalized streaming video understanding.
Findings
PEARL achieves state-of-the-art performance across multiple models.
PEARL improves personalization in 3 different architectures.
PEARL demonstrates robustness in real-time video understanding.
Abstract
Human cognition of new concepts is inherently a streaming process: we continuously recognize new objects or identities and update our memories over time. However, current multimodal personalization methods are largely limited to static images or offline videos. This disconnects continuous visual input from instant real-world feedback, limiting their ability to provide the real-time, interactive personalized responses essential for future AI assistants. To bridge this gap, we first propose and formally define the novel task of Personalized Streaming Video Understanding (PSVU). To facilitate research in this new direction, we introduce PEARL-Bench, the first comprehensive benchmark designed specifically to evaluate this challenging setting. It evaluates a model's ability to respond to personalized concepts at exact timestamps under two modes: (1) Frame-level, focusing on a specific person…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI
