CoordAR: One-Reference 6D Pose Estimation of Novel Objects via Autoregressive Coordinate Map Generation
Dexin Zuo, Ang Li, Wei Wang, Wenxian Yu, Danping Zou

TL;DR
CoordAR is a new autoregressive framework for estimating the 6D pose of unseen objects from a single reference view, overcoming limitations of previous methods by modeling 3D correspondences probabilistically.
Contribution
It introduces a novel coordinate map tokenization, modality-decoupled encoding, and autoregressive transformer decoding for improved pose estimation of novel objects.
Findings
Outperforms existing methods on multiple benchmarks.
Demonstrates robustness to symmetry and occlusion.
Achieves accurate 6D pose estimation with only one reference view.
Abstract
Object 6D pose estimation, a crucial task for robotics and augmented reality applications, becomes particularly challenging when dealing with novel objects whose 3D models are not readily available. To reduce dependency on 3D models, recent studies have explored one-reference-based pose estimation, which requires only a single reference view instead of a complete 3D model. However, existing methods that rely on real-valued coordinate regression suffer from limited global consistency due to the local nature of convolutional architectures and face challenges in symmetric or occluded scenarios owing to a lack of uncertainty modeling. We present CoordAR, a novel autoregressive framework for one-reference 6D pose estimation of unseen objects. CoordAR formulates 3D-3D correspondences between the reference and query views as a map of discrete tokens, which is obtained in an autoregressive and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
