EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams
Dongchuan Ran, Linyu Ou, Xueheng Li, Wenwen Tong, Chenxu Guo, Hewei Guo, Kaibing Wang, Lewei Lu

TL;DR
EgoPro-Bench is a new benchmark for training and evaluating proactive, personalized interaction capabilities in egocentric video streams, emphasizing timing and user context.
Contribution
It introduces a comprehensive dataset, evaluation protocol, and a novel interaction principle to advance proactive multimodal large language models.
Findings
EgoPro-Bench improves intention understanding in MLLMs.
Models trained on EgoPro-Bench accurately identify HMI timing.
The benchmark enables development of more user-centric proactive agents.
Abstract
Existing Multimodal Large Language Models (MLLMs) remain primarily reactive, failing to continuously perceive environments or proactively assist users. While emerging benchmarks address proactivity, they are largely confined to alert scenarios, neglect personalized context, and fail to evaluate the precise timing of human-machine interactions (HMI).In this paper, we introduce EgoPro-Bench, a novel benchmark for training and evaluating proactive interaction capabilities based on streaming egocentric videos; it comprises 2,400 videos in the evaluation set and over 12,000 videos in the training set.Unlike previous works, EgoPro-Bench leverages simulated user profiles to generate diverse user intentions and to construct high-fidelity HMI data across 12 distinct domains.Subsequently, we propose a specialized evaluation protocol and metrics, train proactive interaction models designed for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
