ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Ronghao Dang, Yuqian Yuan, Wenqi Zhang, Yifei Xin, Boqiang Zhang, Long, Li, Liuyi Wang, Qinyang Zeng, Xin Li, Lidong Bing

TL;DR
This paper introduces ECBench, a comprehensive benchmark for evaluating the embodied cognitive abilities of large vision-language models in egocentric video understanding, addressing current evaluation gaps.
Contribution
It presents ECBench, a systematic and high-quality benchmark with diverse data and evaluation metrics, to assess embodied cognition in LVLMs.
Findings
ECBbench enables detailed evaluation of LVLMs' embodied cognition.
Proprietary and open-source LVLMs show varied performance on ECBench.
The benchmark highlights key challenges like scene perception and hallucination in embodied models.
Abstract
The enhancement of generalization in robots by large vision-language models (LVLMs) is increasingly evident. Therefore, the embodied cognitive abilities of LVLMs based on egocentric videos are of great interest. However, current datasets for embodied video question answering lack comprehensive and systematic evaluation frameworks. Critical embodied cognitive issues, such as robotic self-cognition, dynamic scene perception, and hallucination, are rarely addressed. To tackle these challenges, we propose ECBench, a high-quality benchmark designed to systematically evaluate the embodied cognitive abilities of LVLMs. ECBench features a diverse range of scene video sources, open and varied question formats, and 30 dimensions of embodied cognition. To ensure quality, balance, and high visual dependence, ECBench uses class-independent meticulous human annotation and multi-round question…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Multi-Agent Systems and Negotiation
