EgoReAct: Egocentric Video-Driven 3D Human Reaction Generation

Libo Zhang; Zekun Li; Tianyu Li; Zeyu Cao; Rui Xu; Xiaoxiao Long; Wenjia Wang; Jingbo Wang; Yuan Liu; Wenping Wang; Daquan Zhou; Taku Komura; Zhiyang Dou

arXiv:2512.22808·cs.CV·January 6, 2026

EgoReAct: Egocentric Video-Driven 3D Human Reaction Generation

Libo Zhang, Zekun Li, Tianyu Li, Zeyu Cao, Rui Xu, Xiaoxiao Long, Wenjia Wang, Jingbo Wang, Yuan Liu, Wenping Wang, Daquan Zhou, Taku Komura, Zhiyang Dou

PDF

Open Access

TL;DR

EgoReAct introduces a real-time autoregressive framework that generates 3D-aligned human reactions from egocentric videos, addressing data scarcity and spatial misalignment issues with a new dataset and advanced modeling techniques.

Contribution

The paper presents the first autoregressive model for 3D reaction generation from egocentric videos, utilizing a new dataset and incorporating 3D dynamic features for improved realism and spatial consistency.

Findings

01

EgoReAct outperforms prior methods in realism and spatial consistency.

02

The model achieves real-time reaction generation with strict causality.

03

The Human Reaction Dataset (HRD) effectively addresses data scarcity and misalignment.

Abstract

Humans exhibit adaptive, context-sensitive responses to egocentric visual input. However, faithfully modeling such reactions from egocentric video remains challenging due to the dual requirements of strictly causal generation and precise 3D spatial alignment. To tackle this problem, we first construct the Human Reaction Dataset (HRD) to address data scarcity and misalignment by building a spatially aligned egocentric video-reaction dataset, as existing datasets (e.g., ViMo) suffer from significant spatial inconsistency between the egocentric video and reaction motion, e.g., dynamically moving motions are always paired with fixed-camera videos. Leveraging HRD, we present EgoReAct, the first autoregressive framework that generates 3D-aligned human reaction motions from egocentric video streams in real-time. We first compress the reaction motion into a compact yet expressive latent space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis