CHOIR: Contact-aware 4D Hand-Object Interaction Reconstruction
Hao Xu, Yilin Liu, Yinqiao Wang, Chi-Wing Fu, Niloy J. Mitra

TL;DR
This paper introduces CHOIR, a novel framework for reconstructing detailed 4D hand-object interactions from monocular videos by leveraging contact information to improve accuracy and consistency.
Contribution
CHOIR is the first method to explicitly incorporate contact as a coupling signal in monocular 4D hand-object interaction reconstruction, enhancing realism and temporal coherence.
Findings
Improves object reconstruction accuracy over state-of-the-art methods.
Enhances physical plausibility and temporal consistency in reconstructions.
Effective in both controlled and challenging real-world videos.
Abstract
We ask whether everyday open-world monocular videos can be turned into reusable 4D interaction primitives: articulated hand motion, object shape with 6D pose over time, and the when/where of contact. Such a capability would enable scalable mining of real interactions and, beyond reconstruction, support scene-aware synthesis and planning. However, reconstructing hand-object interaction (HOI) from challenging monocular videos remains difficult: methods often assume known objects or curated scenes, and separately estimated hands and objects easily become misaligned under clutter, occlusion, and unseen object geometries. Targeting this setting, we present CHOIR, a Contact-aware HOI Reconstruction framework for a monocular camera, using contact as an explicit coupling signal between hands and objects. CHOIR first initializes a coarse, contact-agnostic 4D HOI sequence from open-world visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
