AoE: Always-on Egocentric Human Video Collection for Embodied AI
Bowen Yang, Zishuo Li, Yang Sun, Changtao Miao, Yifan Yang, Man Luo, Xiaotong Yan, Feng Jiang, Jinchuan Shi, Yankai Fu, Ning Chen, Junkai Zhao, Pengwei Wang, Guocai Yao, Shanghang Zhang, Hao Chen, Zhe Li, and Kai Zhu

TL;DR
This paper introduces AoE, a low-cost, scalable system for collecting egocentric human video data using smartphones, which enhances embodied AI models by providing high-quality real-world interaction data.
Contribution
The paper presents a novel, scalable data collection system leveraging smartphones and cloud processing to gather egocentric videos from humans worldwide, reducing costs and hardware dependencies.
Findings
High-quality egocentric data improves real-world generalization.
AoE system enables large-scale, distributed data collection.
Automated labeling enhances data processing efficiency.
Abstract
Embodied foundation models require large-scale, high-quality real-world interaction data for pre-training and scaling. However, existing data collection methods suffer from high infrastructure costs, complex hardware dependencies, and limited interaction scope, making scalable expansion challenging. In fact, humans themselves are ideal physically embodied agents. Therefore, obtaining egocentric real-world interaction data from globally distributed "human agents" offers advantages of low cost and sustainability. To this end, we propose the Always-on Egocentric (AoE) data collection system, which aims to simplify hardware dependencies by leveraging humans themselves and their smartphones, enabling low-cost, highly efficient, and scene-agnostic real-world interaction data collection to address the challenge of data scarcity. Specifically, we first employ an ergonomic neck-mounted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Social Robot Interaction and HRI · Human Pose and Action Recognition
