Glissando-Net: Deep sinGLe vIew category level poSe eStimation ANd 3D recOnstruction
Bo Sun, Hao Kang, Li Guan, Haoxiang Li, Philippos Mordohai, Gang Hua

TL;DR
Glissando-Net is a deep learning model that jointly estimates object pose and reconstructs 3D shape from a single RGB image, integrating 2D-3D features for improved accuracy.
Contribution
The paper introduces Glissando-Net, a novel joint auto-encoder framework that simultaneously predicts 3D shape and pose from RGB images, enhancing prior methods that focused on either task alone.
Findings
Outperforms existing methods in pose and shape estimation accuracy
Effectively integrates 2D-3D features for improved predictions
Eliminates the need for code optimization during inference
Abstract
We present a deep learning model, dubbed Glissando-Net, to simultaneously estimate the pose and reconstruct the 3D shape of objects at the category level from a single RGB image. Previous works predominantly focused on either estimating poses(often at the instance level), or reconstructing shapes, but not both. Glissando-Net is composed of two auto-encoders that are jointly trained, one for RGB images and the other for point clouds. We embrace two key design choices in Glissando-Net to achieve a more accurate prediction of the 3D shape and pose of the object given a single RGB image as input. First, we augment the feature maps of the point cloud encoder and decoder with transformed feature maps from the image decoder, enabling effective 2D-3D interaction in both training and prediction. Second, we predict both the 3D shape and pose of the object in the decoder stage. This way, we better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
