EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

Zeyu Wang; Chang Liu; Eduardus Tjitrahardja; Yuntao Wang; Borislav Pavlov; Fangfei Gou; Jose Manuel Davila; Dai Shi; Ran Xu; Yue Pan; Jiayi Tan; Shuting Chang; Qi Wang; Jinzhao Li; Jiacheng Hua; Yifei Huang; Jingwei Sun; Yu Zhang; Liuxin Zhang; Guocai Yao; Jia Jia; Yin Li; Qianying Wang; Yuanchun Shi; Miao Liu

arXiv:2605.17262·cs.CV·May 19, 2026

EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

Zeyu Wang, Chang Liu, Eduardus Tjitrahardja, Yuntao Wang, Borislav Pavlov, Fangfei Gou, Jose Manuel Davila, Dai Shi, Ran Xu, Yue Pan, Jiayi Tan, Shuting Chang, Qi Wang, Jinzhao Li, Jiacheng Hua, Yifei Huang, Jingwei Sun, Yu Zhang, Liuxin Zhang, Guocai Yao, Jia Jia, Yin Li

PDF

1 Repo

TL;DR

EgoIntrospect introduces a comprehensive egocentric dataset with multimodal signals and annotations for understanding users' internal states, enabling new benchmarks for AI assistant research.

Contribution

The paper presents the first egocentric dataset with self-annotations of internal states and benchmarks for multimodal reasoning about user intentions and emotions.

Findings

01

Existing multimodal models struggle to infer internal states from the dataset.

02

EgoIntrospect provides 180 hours of synchronized multimodal recordings from 60 subjects.

03

Benchmarks reveal gaps in current models' ability to understand user internal states.

Abstract

Despite extensive efforts on egocentric video datasets and benchmarks, understanding users' internal states, which is crucial for enabling seamless AI assistant experiences, remains largely overlooked. In this work, we introduce EgoIntrospect, the first egocentric dataset captured in user-driven scenarios with self-annotations that explicitly reveal users' interactive intentions with AI assistants. EgoIntrospect was collected using a cross-device setup, providing synchronized video, audio, gaze, motion, and physiological signals. It consists of 180 hours of recordings from 60 subjects, with an average recording duration of 3 hours per subject. Leveraging EgoIntrospect, we formalize a suite of tasks centered on user internal states, including affective experience, interactive intent, and cognitive memory. We further process the annotations to construct benchmarks that evaluate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://ego-introspect.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.