Object pop-up: Can we infer 3D objects and their poses from human interactions alone?
Ilya A. Petrov, Riccardo Marin, Julian Chibane, Gerard Pons-Moll

TL;DR
This paper explores inferring 3D objects and their poses solely from human interactions and poses, demonstrating that a human point cloud can reveal unseen objects in XR/VR contexts.
Contribution
It introduces a novel approach to infer 3D objects from human interactions alone, addressing a less-explored inverse problem in object-centric perception.
Findings
Successfully infers unseen objects from human poses.
Validates approach with synthetic and real XR/VR data.
Demonstrates applicability for immersive environments.
Abstract
The intimate entanglement between objects affordances and human poses is of large interest, among others, for behavioural sciences, cognitive psychology, and Computer Vision communities. In recent years, the latter has developed several object-centric approaches: starting from items, learning pipelines synthesizing human poses and dynamics in a realistic way, satisfying both geometrical and functional expectations. However, the inverse perspective is significantly less explored: Can we infer 3D objects and their poses from human interactions alone? Our investigation follows this direction, showing that a generic 3D human point cloud is enough to pop up an unobserved object, even when the user is just imitating a functionality (e.g., looking through a binocular) without involving a tangible counterpart. We validate our method qualitatively and quantitatively, with synthetic data and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Face recognition and analysis
