EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric   Views

Yuhang Yang; Wei Zhai; Chengfeng Wang; Chengjun Yu; Yang Cao,; Zheng-Jun Zha

arXiv:2405.13659·cs.CV·October 15, 2024

EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views

Yuhang Yang, Wei Zhai, Chengfeng Wang, Chengjun Yu, Yang Cao,, Zheng-Jun Zha

PDF

Open Access

TL;DR

EgoChoir is a novel method that infers 3D human-object interaction regions from egocentric videos by integrating appearance, head motion, and object structure, improving understanding of interactions for AR/VR and AI applications.

Contribution

It introduces EgoChoir, a framework that models object affordance and human contact from egocentric views by combining visual cues and head motion, addressing limitations of previous exocentric methods.

Findings

01

EgoChoir outperforms existing methods in capturing 3D interaction regions.

02

The approach effectively models object affordance and human contact.

03

Experiments validate the method's superiority on Ego-Exo4D and GIMO datasets.

Abstract

Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception, facilitating applications like AR/VR and embodied AI. For the egocentric HOI, in addition to perceiving semantics e.g., ''what'' interaction is occurring, capturing ''where'' the interaction specifically manifests in 3D space is also crucial, which links the perception and operation. Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view. However, incomplete observations of interacting parties in the egocentric view introduce ambiguity between visual observations and interaction contents, impairing their efficacy. From the egocentric view, humans integrate the visual cortex, cerebellum, and brain to internalize their intentions and interaction concepts of objects, allowing for the pre-formulation of interactions and making…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications