Shape and Viewpoint without Keypoints
Shubham Goel, Angjoo Kanazawa, Jitendra Malik

TL;DR
This paper introduces an unsupervised learning framework for 3D shape, pose, and texture reconstruction from single images without ground truth annotations, utilizing a novel camera distribution representation called "camera-multiplex."
Contribution
It proposes a new approach called U-CMR that predicts diverse 3D shapes and camera viewpoints using a set of hypotheses, advancing unsupervised 3D reconstruction methods.
Findings
Achieves state-of-the-art camera prediction results.
Learns diverse shapes and textures without keypoint or 3D ground truth.
Demonstrates effectiveness on multiple datasets.
Abstract
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision. We approach this highly under-constrained problem in a "analysis by synthesis" framework where the goal is to predict the likely shape, texture and camera viewpoint that could produce the image with various learned category-specific priors. Our particular contribution in this paper is a representation of the distribution over cameras, which we call "camera-multiplex". Instead of picking a point estimate, we maintain a set of camera hypotheses that are optimized during training to best explain the image given the current shape and texture. We call our approach Unsupervised Category-Specific Mesh Reconstruction (U-CMR), and present qualitative and quantitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
