BANMo: Building Animatable 3D Neural Models from Many Casual Videos
Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea, Vedaldi, Hanbyul Joo

TL;DR
BANMo is a novel method that reconstructs high-fidelity, animatable 3D models of objects from many casual monocular videos without requiring specialized sensors or pre-defined templates, enabling scalable 3D modeling in the wild.
Contribution
It introduces a differentiable framework combining deformable shape models, neural radiance fields, and canonical embeddings for dense correspondence and self-supervised learning from videos.
Findings
Outperforms prior methods in 3D reconstruction quality for humans and animals.
Capable of rendering realistic images from new viewpoints and poses.
Works effectively on both real and synthetic datasets.
Abstract
Prior work for articulated 3D shape reconstruction often relies on specialized sensors (e.g., synchronized multi-camera systems), or pre-built 3D deformable models (e.g., SMAL or SMPL). Such methods are not able to scale to diverse sets of objects in the wild. We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape. BANMo builds high-fidelity, articulated 3D models (including shape and animatable skinning weights) from many monocular casual videos in a differentiable rendering framework. While the use of many videos provides more coverage of camera views and object articulations, they introduce significant challenges in establishing correspondence across scenes with different backgrounds, illumination conditions, etc. Our key insight is to merge three schools of thought; (1) classic deformable shape models that make use of articulated bones…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
