EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views
Yuhang Yang, Wei Zhai, Chengfeng Wang, Chengjun Yu, Yang Cao,, Zheng-Jun Zha

TL;DR
EgoChoir is a novel method that infers 3D human-object interaction regions from egocentric videos by integrating appearance, head motion, and object structure, improving understanding of interactions for AR/VR and AI applications.
Contribution
It introduces EgoChoir, a framework that models object affordance and human contact from egocentric views by combining visual cues and head motion, addressing limitations of previous exocentric methods.
Findings
EgoChoir outperforms existing methods in capturing 3D interaction regions.
The approach effectively models object affordance and human contact.
Experiments validate the method's superiority on Ego-Exo4D and GIMO datasets.
Abstract
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception, facilitating applications like AR/VR and embodied AI. For the egocentric HOI, in addition to perceiving semantics e.g., ''what'' interaction is occurring, capturing ''where'' the interaction specifically manifests in 3D space is also crucial, which links the perception and operation. Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view. However, incomplete observations of interacting parties in the egocentric view introduce ambiguity between visual observations and interaction contents, impairing their efficacy. From the egocentric view, humans integrate the visual cortex, cerebellum, and brain to internalize their intentions and interaction concepts of objects, allowing for the pre-formulation of interactions and making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications
