Building Egocentric Procedural AI Assistant: Methods, Benchmarks, and Challenges
Junlong Li, Huaiyuan Xu, Sijie Cheng, Kejun Wu, Kim-Hui Yap, Lap-Pui Chau, Yi Wang

TL;DR
This paper introduces the concept of an egocentric procedural AI assistant, defines core tasks and enabling dimensions, reviews current techniques and datasets, and evaluates VLM-based methods to identify challenges and future directions.
Contribution
It establishes a new taxonomy for egocentric procedural tasks, provides a comprehensive review and evaluation of existing methods, and discusses future research challenges in the field.
Findings
VLM-based methods show varying effectiveness in egocentric tasks
Identified key challenges in real-time streaming video understanding
Proposed future directions for improving egocentric procedural AI assistants
Abstract
Driven by recent advances in vision-language models (VLMs) and egocentric perception research, the emerging topic of an egocentric procedural AI assistant (EgoProceAssist) is introduced to step-by-step support daily procedural tasks in a first-person view. In this paper, we start by identifying three core tasks in EgoProceAssist: egocentric procedural error detection, egocentric procedural learning, and egocentric procedural question answering, then introduce two enabling dimensions: real-time and streaming video understanding, and proactive interaction in procedural contexts. We define these tasks within a new taxonomy as the EgoProceAssist's essential functions and illustrate how they can be deployed in real-world scenarios for daily activity assistants. Specifically, our work encompasses a comprehensive review of current techniques, relevant datasets, and evaluation metrics across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
