TL;DR
This paper introduces LEXIS, a novel approach for 3D human-object interaction reconstruction from images, using dense proximity encoding and a learned interaction signature manifold to improve realism and accuracy.
Contribution
The work presents a new dense proximity representation called InterFields and a learned interaction signature manifold via VQ-VAE, enabling more realistic 3D reconstructions from single images.
Findings
Outperforms state-of-the-art methods in reconstruction, contact, and proximity quality.
Enhances generalization and realism in 3D human-object interaction reconstructions.
Provides a guided refinement process that ensures physically plausible results.
Abstract
Reconstructing 3D Human-Object Interaction from an RGB image is essential for perceptive systems. Yet, this remains challenging as it requires capturing the subtle physical coupling between the body and objects. While current methods rely on sparse, binary contact cues, these fail to model the continuous proximity and dense spatial relationships that characterize natural interactions. We address this limitation via InterFields, a representation that encodes dense, continuous proximity across the entire body and object surfaces. However, inferring these fields from single images is inherently ill-posed. To tackle this, our intuition is that interaction patterns are characteristically structured by the action and object geometry. We capture this structure in LEXIS, a novel discrete manifold of interaction signatures learned via a VQ-VAE. We then develop LEXIS-Flow, a diffusion framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
