Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis
Chonghyuk Song, Gengshan Yang, Kangle Deng, Jun-Yan Zhu, Deva Ramanan

TL;DR
Total-Recon is a novel method that enables photorealistic reconstruction and view synthesis of deformable scenes from long monocular RGBD videos, effectively handling complex articulated motions and scene decomposition.
Contribution
It introduces the first approach to reconstruct deformable scenes from long monocular RGBD videos by hierarchical decomposition of background and objects with articulated motion modeling.
Findings
Outperforms prior methods on challenging videos
Successfully reconstructs complex articulated motions
Enables novel view synthesis from monocular RGBD data
Abstract
We explore the task of embodied view synthesis from monocular videos of deformable scenes. Given a minute-long RGBD video of people interacting with their pets, we render the scene from novel camera trajectories derived from the in-scene motion of actors: (1) egocentric cameras that simulate the point of view of a target actor and (2) 3rd-person cameras that follow the actor. Building such a system requires reconstructing the root-body and articulated motion of every actor, as well as a scene representation that supports free-viewpoint synthesis. Longer videos are more likely to capture the scene from diverse viewpoints (which helps reconstruction) but are also more likely to contain larger motions (which complicates reconstruction). To address these challenges, we present Total-Recon, the first method to photorealistically reconstruct deformable scenes from long monocular RGBD videos.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis· youtube
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Human Pose and Action Recognition
