CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation
Jianyu Zhao, Wei Quan, Bogdan J. Matuszewski

TL;DR
CVAM-Pose introduces a novel label-embedded variational autoencoder for multi-object monocular pose estimation, achieving high accuracy without reliance on 3D models or depth data, and demonstrating robustness to occlusion and clutter.
Contribution
The paper presents a new approach using a label-embedded conditional variational autoencoder for scalable, efficient multi-object pose estimation from monocular images, outperforming existing latent space methods.
Findings
Outperforms AAE and Multi-Path methods by 20-25% on AR_VSD metric.
Robust to occlusion and clutter in multi-object scenes.
Comparable to 3D model-based methods in BOP challenges.
Abstract
Estimating rigid objects' poses is one of the fundamental problems in computer vision, with a range of applications across automation and augmented reality. Most existing approaches adopt one network per object class strategy, depend heavily on objects' 3D models, depth data, and employ a time-consuming iterative refinement, which could be impractical for some applications. This paper presents a novel approach, CVAM-Pose, for multi-object monocular pose estimation that addresses these limitations. The CVAM-Pose method employs a label-embedded conditional variational autoencoder network, to implicitly abstract regularised representations of multiple objects in a single low-dimensional latent space. This autoencoding process uses only images captured by a projective camera and is robust to objects' occlusion and scene clutter. The classes of objects are one-hot encoded and embedded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Image and Object Detection Techniques · Hand Gesture Recognition Systems
MethodsUSD Coin Customer Service Number +1-833-534-1729 · Conditional Variational Auto Encoder
