X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding

Wenqi Zhou; Kai Cao; Hao Zheng; Yunze Liu; Xinyi Zheng; Miao Liu; Per Ola Kristensson; Walterio Mayol-Cuevas; Fan Zhang; Weizhe Lin; Junxiao Shen

arXiv:2501.06835·cs.CV·January 27, 2026

X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding

Wenqi Zhou, Kai Cao, Hao Zheng, Yunze Liu, Xinyi Zheng, Miao Liu, Per Ola Kristensson, Walterio Mayol-Cuevas, Fan Zhang, Weizhe Lin, Junxiao Shen

PDF

2 Datasets 1 Video

TL;DR

X-LeBench introduces a new benchmark dataset for evaluating the understanding of extremely long egocentric videos, addressing a significant gap in existing datasets and highlighting challenges for current models.

Contribution

The paper presents X-LeBench, a novel dataset with realistic, long-duration egocentric videos and a simulation pipeline for comprehensive long-term activity analysis.

Findings

01

Baseline systems perform poorly on long videos

02

Challenges include temporal localization and context reasoning

03

Highlights need for advanced models in long-form video understanding

Abstract

Long-form egocentric video understanding provides rich contextual information and unique insights into long-term human behaviors, holding significant potential for applications in embodied intelligence, long-term activity analysis, and personalized assistive technologies. However, existing benchmark datasets primarily focus on single, short (\eg, minutes to tens of minutes) to moderately long videos, leaving a substantial gap in evaluating extensive, ultra-long egocentric video recordings. To address this, we introduce X-LeBench, a novel benchmark dataset meticulously designed to fill this gap by focusing on tasks requiring a comprehensive understanding of extremely long egocentric video recordings. Our X-LeBench develops a life-logging simulation pipeline that produces realistic, coherent daily plans aligned with real-world video data. This approach enables the flexible integration of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding· underline

Taxonomy

MethodsFocus