CVAM-Pose: Conditional Variational Autoencoder for Multi-Object   Monocular Pose Estimation

Jianyu Zhao; Wei Quan; Bogdan J. Matuszewski

arXiv:2410.09010·cs.CV·October 14, 2024

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation

Jianyu Zhao, Wei Quan, Bogdan J. Matuszewski

PDF

Open Access 1 Repo

TL;DR

CVAM-Pose introduces a novel label-embedded variational autoencoder for multi-object monocular pose estimation, achieving high accuracy without reliance on 3D models or depth data, and demonstrating robustness to occlusion and clutter.

Contribution

The paper presents a new approach using a label-embedded conditional variational autoencoder for scalable, efficient multi-object pose estimation from monocular images, outperforming existing latent space methods.

Findings

01

Outperforms AAE and Multi-Path methods by 20-25% on AR_VSD metric.

02

Robust to occlusion and clutter in multi-object scenes.

03

Comparable to 3D model-based methods in BOP challenges.

Abstract

Estimating rigid objects' poses is one of the fundamental problems in computer vision, with a range of applications across automation and augmented reality. Most existing approaches adopt one network per object class strategy, depend heavily on objects' 3D models, depth data, and employ a time-consuming iterative refinement, which could be impractical for some applications. This paper presents a novel approach, CVAM-Pose, for multi-object monocular pose estimation that addresses these limitations. The CVAM-Pose method employs a label-embedded conditional variational autoencoder network, to implicitly abstract regularised representations of multiple objects in a single low-dimensional latent space. This autoencoding process uses only images captured by a projective camera and is robust to objects' occlusion and scene clutter. The classes of objects are one-hot encoded and embedded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jzhao12/cvam-pose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Image and Object Detection Techniques · Hand Gesture Recognition Systems

MethodsUSD Coin Customer Service Number +1-833-534-1729 · Conditional Variational Auto Encoder