Shelf-Supervised Mesh Prediction in the Wild
Yufei Ye, Shubham Tulsiani, Abhinav Gupta

TL;DR
This paper introduces a shelf-supervised learning approach for 3D shape and pose estimation from single images, using only segmentation supervision from existing recognition systems, enabling scalable multi-category 3D reconstruction.
Contribution
It presents a novel method that infers 3D shape and pose from unstructured image collections with minimal supervision, scalable to 50 categories in the wild.
Findings
Effective 3D shape and pose inference from single images.
Scalable to 50 categories in real-world datasets.
Outperforms existing methods in multi-category 3D reconstruction.
Abstract
We aim to infer 3D shape and pose of object from a single image and propose a learning-based approach that can train from unstructured image collections, supervised by only segmentation outputs from off-the-shelf recognition systems (i.e. 'shelf-supervised'). We first infer a volumetric representation in a canonical frame, along with the camera pose. We enforce the representation geometrically consistent with both appearance and masks, and also that the synthesized novel views are indistinguishable from image collections. The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame. These two steps allow both shape-pose factorization from image collections and per-instance reconstruction in finer details. We examine the method on both synthetic and real-world datasets and demonstrate its scalability on 50…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · 3D Surveying and Cultural Heritage
