EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

Lu Chen; Yizhou Wang; Shixiang Tang; Qianhong Ma; Tong He; Wanli Ouyang; Xiaowei Zhou; Hujun Bao; Sida Peng

arXiv:2502.05857·cs.CV·September 12, 2025

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

Lu Chen, Yizhou Wang, Shixiang Tang, Qianhong Ma, Tong He, Wanli Ouyang, Xiaowei Zhou, Hujun Bao, Sida Peng

PDF

Open Access

TL;DR

EgoAgent is a unified transformer-based model that jointly learns perception, prediction, and action in egocentric environments, capturing their interdependencies for improved performance across various tasks.

Contribution

The paper introduces EgoAgent, a novel unified model that integrates perception, prediction, and action learning in a single transformer architecture, inspired by the perception-action loop.

Findings

01

Outperforms existing methods on egocentric tasks

02

Demonstrates superior future state and motion prediction accuracy

03

Shows effective joint learning of perception, prediction, and action

Abstract

Learning an agent model that behaves like humans-capable of jointly perceiving the environment, predicting the future, and taking actions from a first-person perspective-is a fundamental challenge in computer vision. Existing methods typically train separate models for these abilities, which fail to capture their intrinsic relationships and prevent them from learning from each other. Inspired by how humans learn through the perception-action loop, we propose EgoAgent, a unified agent model that simultaneously learns to represent, predict, and act within a single transformer. EgoAgent explicitly models the causal and temporal dependencies among these abilities by formulating the task as an interleaved sequence of states and actions. It further introduces a joint embedding-action-prediction architecture with temporally asymmetric predictor and observer branches, enabling synergistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Game Theory and Cooperation