EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs

Yang Dai; Dian Jiao; Tianwei Lin; Wenqiao Zhang

arXiv:2605.19559·cs.CV·May 20, 2026

EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs

Yang Dai, Dian Jiao, Tianwei Lin, Wenqiao Zhang

PDF

1 Repo

TL;DR

EgoCoT-Bench is a new benchmark for evaluating grounded, step-by-step reasoning in multimodal large language models on egocentric videos, addressing previous limitations in fine-grained, evidence-based evaluation.

Contribution

It introduces a comprehensive, annotated egocentric video benchmark with explicit reasoning steps and evidence, enabling better assessment of grounded reasoning in MLLMs.

Findings

01

Models struggle with fine-grained egocentric reasoning.

02

Many models produce explanations consistent with answers but not with evidence.

03

EgoCoT-Bench reveals gaps in current multimodal reasoning capabilities.

Abstract

The rapid development of Multimodal Large Language Models (MLLMs) has led to growing interest in egocentric video understanding, specifically the ability for MLLMs to recognize fine-grained hand-object interactions, track object state changes over time, and reason about manipulative processes in dynamic environments from a first-person perspective. However, existing egocentric video benchmarks suffer from \textbf{limited grounded rationale evaluation}, offering limited support for fine-grained operation-centric reasoning and rarely examining whether model rationales are grounded in explicit spatio-temporal evidence. To address this gap, we introduce \textbf{EgoCoT-Bench}, a fine-grained egocentric benchmark for grounded and verifiable operation-centric reasoning with explicit step-by-step rationale annotations. Overall, EgoCoT-Bench comprises 3,172 verifiable QA pairs over 351…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://dstardust.github.io/EgoCoT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.