LifeEval: A Multimodal Benchmark for Assistive AI in Egocentric Daily Life Tasks
Hengjian Gao, Kaiwei Zhang, Shibo Wang, Mingjie Chen, Qihang Cao, Xianfeng Wang, Yucheng Zhu, Xiongkuo Min, Wei Sun, Dandan Zhu, Guangtao Zhai

TL;DR
LifeEval is a new multimodal benchmark designed to evaluate assistive AI systems in real-time, egocentric daily life tasks, emphasizing human-AI collaboration, perception, and natural dialogue.
Contribution
It introduces a comprehensive benchmark with annotated data for assessing multimodal, real-time, task-oriented human-AI interaction from a first-person perspective.
Findings
Current MLLMs struggle with timely, effective assistance in real-world tasks.
LifeEval reveals significant challenges in achieving adaptive, human-centered AI interactions.
Benchmark facilitates targeted improvements in multimodal, interactive AI systems.
Abstract
The rapid progress of Multimodal Large Language Models (MLLMs) marks a significant step toward artificial general intelligence, offering great potential for augmenting human capabilities. However, their ability to provide effective assistance in dynamic, real-world environments remains largely underexplored. Existing video benchmarks predominantly assess passive understanding through retrospective analysis or isolated perception tasks, failing to capture the interactive and adaptive nature of real-time user assistance. To bridge this gap, we introduce LifeEval, a multimodal benchmark designed to evaluate real-time, task-oriented human-AI collaboration in daily life from an egocentric perspective. LifeEval emphasizes three key aspects: task-oriented holistic evaluation, egocentric real-time perception from continuous first-person streams, and human-assistant collaborative interaction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Social Robot Interaction and HRI
