End-to-End Learning of Multi-category 3D Pose and Shape Estimation

Yigit Baran Can; Alexander Liniger; Danda Pani Paudel; Luc Van Gool

arXiv:2112.10196·cs.CV·March 10, 2022

End-to-End Learning of Multi-category 3D Pose and Shape Estimation

Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool

PDF

Open Access

TL;DR

This paper introduces an end-to-end Transformer-based approach for multi-category 3D pose and shape estimation from images, effectively handling occlusions and diverse object classes with improved accuracy.

Contribution

It presents a novel unified neural network that detects 2D keypoints and lifts them to 3D across multiple categories using visual context, trained only on 2D annotations.

Findings

01

Outperforms state-of-the-art on three benchmarks

02

Handles occlusions and multiple object categories

03

Uses only 2D keypoint annotations for training

Abstract

In this paper, we study the representation of the shape and pose of objects using their keypoints. Therefore, we propose an end-to-end method that simultaneously detects 2D keypoints from an image and lifts them to 3D. The proposed method learns both 2D detection and 3D lifting only from 2D keypoints annotations. In addition to being end-to-end from images to 3D keypoints, our method also handles objects from multiple categories using a single neural network. We use a Transformer-based architecture to detect the keypoints, as well as to summarize the visual context of the image. This visual context information is then used while lifting the keypoints to 3D, to allow context-based reasoning for better performance. Our method can handle occlusions as well as a wide variety of object classes. Our experiments on three benchmarks demonstrate that our method performs better than the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robotics and Sensor-Based Localization