TeleEgo: Benchmarking Egocentric AI Assistants in the Wild
Jiaqi Yan, Ruilong Ren, Jingren Liu, Shuning Xu, Ling Wang, Yiheng Wang, Xinlin Zhong, Yun Wang, Long Zhang, Xiangyu Chen, Changzhi Sun, Jixiang Luo, Dell Zhang, Hao Sun, Chi Zhang, Xuelong Li

TL;DR
TeleEgo is a comprehensive benchmark for evaluating egocentric AI assistants in real-world, streaming scenarios, focusing on multi-modal understanding, memory, and responsiveness over long durations.
Contribution
It introduces TeleEgo, a long-duration, multi-modal benchmark with new metrics for assessing real-time accuracy and long-term memory in egocentric AI assistants.
Findings
Current models show limited real-time accuracy in streaming settings.
The benchmark reveals challenges in long-term memory retention.
TeleEgo provides a platform for systematic evaluation of egocentric AI capabilities.
Abstract
Egocentric AI assistants in real-world settings must process multi-modal inputs (video, audio, text), respond in real time, and retain evolving long-term memory. However, existing benchmarks typically evaluate these abilities in isolation, lack realistic streaming scenarios, or support only short-term tasks. We introduce \textbf{TeleEgo}, a long-duration, streaming, omni-modal benchmark for evaluating egocentric AI assistants in realistic daily contexts. The dataset features over 14 hours per participant of synchronized egocentric video, audio, and text across four domains: work \& study, lifestyle \& routines, social activities, and outings \& culture. All data is aligned on a unified global timeline and includes high-quality visual narrations and speech transcripts, curated through human refinement.TeleEgo defines 12 diagnostic subtasks across three core capabilities: Memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
