Efficient and Scalable Monocular Human-Object Interaction Motion Reconstruction
Boran Wen, Ye Lu, Sirui Wang, Keyan Wan, Jiahong Zhou, Junxuan Liang, Xinpeng Liu, Bang Xiao, Ruiyang Liu, and Yong-Lu Li

TL;DR
This paper presents a scalable approach for reconstructing 4D human-object interactions from monocular videos, introducing new annotation methods, a large dataset, and applications in imitation learning.
Contribution
It introduces an efficient annotation paradigm, a multi-modal predictor, and a novel optimization framework for large-scale 4D HOI reconstruction from monocular videos.
Findings
Successfully reconstructed diverse 4D HOI data
Created the large-scale Open4DHOI dataset
Enabled RL agents to imitate reconstructed motions
Abstract
Generalized robots must learn from diverse, large-scale human-object interactions (HOI) to operate robustly in the real world. Monocular internet videos offer a nearly limitless and readily available source of data, capturing an unparalleled diversity of human activities, objects, and environments. However, accurately and scalably extracting 4D interaction data from these in-the-wild videos remains a significant and unsolved challenge. To overcome the annotation bottleneck, we introduce an efficient sparse contact annotation paradigm. To scale this process, we develop InterPoint, a multi-modal predictor that drives a human-in-the-loop data engine. Building upon these efficiently acquired annotations, we introduce 4DHOISolver, a novel optimization framework that constrains the ill-posed 4D HOI reconstruction problem, maintaining high spatio-temporal coherence and physical plausibility.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications
