EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams

Dongchuan Ran; Linyu Ou; Xueheng Li; Wenwen Tong; Chenxu Guo; Hewei Guo; Kaibing Wang; Lewei Lu

arXiv:2605.07299·cs.CV·May 11, 2026

EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams

Dongchuan Ran, Linyu Ou, Xueheng Li, Wenwen Tong, Chenxu Guo, Hewei Guo, Kaibing Wang, Lewei Lu

PDF

TL;DR

EgoPro-Bench is a new benchmark for training and evaluating proactive, personalized interaction capabilities in egocentric video streams, emphasizing timing and user context.

Contribution

It introduces a comprehensive dataset, evaluation protocol, and a novel interaction principle to advance proactive multimodal large language models.

Findings

01

EgoPro-Bench improves intention understanding in MLLMs.

02

Models trained on EgoPro-Bench accurately identify HMI timing.

03

The benchmark enables development of more user-centric proactive agents.

Abstract

Existing Multimodal Large Language Models (MLLMs) remain primarily reactive, failing to continuously perceive environments or proactively assist users. While emerging benchmarks address proactivity, they are largely confined to alert scenarios, neglect personalized context, and fail to evaluate the precise timing of human-machine interactions (HMI).In this paper, we introduce EgoPro-Bench, a novel benchmark for training and evaluating proactive interaction capabilities based on streaming egocentric videos; it comprises 2,400 videos in the evaluation set and over 12,000 videos in the training set.Unlike previous works, EgoPro-Bench leverages simulated user profiles to generate diverse user intentions and to construct high-fidelity HMI data across 12 distinct domains.Subsequently, we propose a specialized evaluation protocol and metrics, train proactive interaction models designed for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.