TL;DR
UniCon3R is a real-time, contact-aware 4D human-scene reconstruction framework from monocular video that improves physical plausibility by modeling human-environment interactions.
Contribution
It introduces a novel contact inference mechanism that enhances joint human-scene reconstruction accuracy and realism in a fast, feed-forward manner.
Findings
Outperforms state-of-the-art methods on human motion estimation.
Improves physical plausibility by modeling contact interactions.
Maintains fast, feed-forward inference speeds.
Abstract
We introduce UniCon3R, a unified feed-forward framework for online human-scene 4D reconstruction from monocular video. Current feed-forward human-scene reconstruction methods suffer from artifacts, where bodies float above the ground or penetrate parts of the scene. A key reason is the lack of effective interaction modelling between the human and the environment. Our goal is to exploit contact between the human and the scene during inference to actively improve the human mesh reconstruction. To that end, we explicitly model interaction by inferring 4D contact from the human pose and scene geometry and use the contact as a corrective cue for generating the pose. This enables UniCon3R to jointly recover scene geometry and spatially aligned 4D humans within the scene. Experiments on standard human-centric video benchmarks show that UniCon3R outperforms state-of-the-art baselines on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
