Glissando-Net: Deep sinGLe vIew category level poSe eStimation ANd 3D   recOnstruction

Bo Sun; Hao Kang; Li Guan; Haoxiang Li; Philippos Mordohai; Gang Hua

arXiv:2501.14896·cs.CV·January 28, 2025

Glissando-Net: Deep sinGLe vIew category level poSe eStimation ANd 3D recOnstruction

Bo Sun, Hao Kang, Li Guan, Haoxiang Li, Philippos Mordohai, Gang Hua

PDF

TL;DR

Glissando-Net is a deep learning model that jointly estimates object pose and reconstructs 3D shape from a single RGB image, integrating 2D-3D features for improved accuracy.

Contribution

The paper introduces Glissando-Net, a novel joint auto-encoder framework that simultaneously predicts 3D shape and pose from RGB images, enhancing prior methods that focused on either task alone.

Findings

01

Outperforms existing methods in pose and shape estimation accuracy

02

Effectively integrates 2D-3D features for improved predictions

03

Eliminates the need for code optimization during inference

Abstract

We present a deep learning model, dubbed Glissando-Net, to simultaneously estimate the pose and reconstruct the 3D shape of objects at the category level from a single RGB image. Previous works predominantly focused on either estimating poses(often at the instance level), or reconstructing shapes, but not both. Glissando-Net is composed of two auto-encoders that are jointly trained, one for RGB images and the other for point clouds. We embrace two key design choices in Glissando-Net to achieve a more accurate prediction of the 3D shape and pose of the object given a single RGB image as input. First, we augment the feature maps of the point cloud encoder and decoder with transformed feature maps from the image decoder, enabling effective 2D-3D interaction in both training and prediction. Second, we predict both the 3D shape and pose of the object in the decoder stage. This way, we better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus