Cerberus: A Multi-headed Derenderer
Boyang Deng, Simon Kornblith, Geoffrey Hinton

TL;DR
Cerberus is a multi-headed neural network model that learns to extract 3D shape parts from single images without part annotations, improving generalization to new viewpoints and poses in complex scenes.
Contribution
It introduces a multi-headed derenderer architecture that disentangles object parts and their relations, enabling unsupervised 3D part extraction from single images.
Findings
Outperforms previous methods in unsupervised 3D part extraction
Effectively extracts natural human body parts from images
Generalizes well to novel viewpoints and poses
Abstract
To generalize to novel visual scenes with new viewpoints and new object poses, a visual system needs representations of the shapes of the parts of an object that are invariant to changes in viewpoint or pose. 3D graphics representations disentangle visual factors such as viewpoints and lighting from object structure in a natural way. It is possible to learn to invert the process that converts 3D graphics representations into 2D images, provided the 3D graphics representations are available as labels. When only the unlabeled images are available, however, learning to derender is much harder. We consider a simple model which is just a set of free floating parts. Each part has its own relation to the camera and its own triangular mesh which can be deformed to model the shape of the part. At test time, a neural network looks at a single image and extracts the shapes of the parts and their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Image Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage
