Ego-Grounding for Personalized Question-Answering in Egocentric Videos

Junbin Xiao; Shenglang Zhang; Pengxiang Zhu; Angela Yao

arXiv:2604.01966·cs.CV·April 3, 2026

Ego-Grounding for Personalized Question-Answering in Egocentric Videos

Junbin Xiao, Shenglang Zhang, Pengxiang Zhu, Angela Yao

PDF

1 Repo

TL;DR

This paper introduces MyEgo, a new dataset and benchmark for evaluating multimodal large language models' ability to perform personalized question-answering in egocentric videos, highlighting current limitations and the importance of ego-grounding.

Contribution

It provides the first egocentric VideoQA dataset and comprehensive analysis revealing the challenges and limitations of current models in ego-grounded personalized reasoning.

Findings

01

Models perform significantly below human accuracy.

02

Explicit evidence improves model performance temporarily.

03

Scaling models and explicit reasoning do not consistently enhance results.

Abstract

We present the first systematic analysis of multimodal large language models (MLLMs) in personalized question-answering requiring ego-grounding - the ability to understand the camera-wearer in egocentric videos. To this end, we introduce MyEgo, the first egocentric VideoQA dataset designed to evaluate MLLMs' ability to understand, remember, and reason about the camera wearer. MyEgo comprises 541 long videos and 5K personalized questions asking about "my things", "my activities", and "my past". Benchmarking reveals that competitive MLLMs across variants, including open-source vs. proprietary, thinking vs. non-thinking, small vs. large scales all struggle on MyEgo. Top closed- and open-source models (e.g., GPT-5 and Qwen3-VL) achieve only~46% and 36% accuracy, trailing human performance by near 40% and 50% respectively. Surprisingly, neither explicit reasoning nor model scaling yield…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ryougetsu3606/MyEgo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.