RelPose++: Recovering 6D Poses from Sparse-view Observations
Amy Lin, Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani

TL;DR
RelPose++ introduces an enhanced method for estimating 6D camera poses from sparse images, utilizing attentional transformers and joint rotation-translation prediction to improve accuracy on diverse objects.
Contribution
It extends RelPose with transformer-based multi-view processing and decoupled translation estimation, significantly improving 6D pose accuracy from sparse views.
Findings
Large improvements over prior art in 6D pose prediction
Effective on both seen and unseen object categories
Enables in-the-wild object pose estimation and reconstruction
Abstract
We address the task of estimating 6D camera poses from sparse-view image sets (2-8 images). This task is a vital pre-processing stage for nearly all contemporary (neural) reconstruction algorithms but remains challenging given sparse views, especially for objects with visual symmetries and texture-less surfaces. We build on the recent RelPose framework which learns a network that infers distributions over relative rotations over image pairs. We extend this approach in two key ways; first, we use attentional transformer layers to process multiple images jointly, since additional views of an object may resolve ambiguous symmetries in any given image pair (such as the handle of a mug that becomes visible in a third view). Second, we augment this network to also report camera translations by defining an appropriate coordinate system that decouples the ambiguity in rotation estimation from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Neural Network Applications
