RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation
Junwen Huang, Shishir Reddy Vutukur, Peter KT Yu, Nassir Navab, Slobodan Ilic, Benjamin Busam

TL;DR
RayPose introduces a novel diffusion transformer approach that reformulates template-based 6D object pose estimation as a ray alignment problem, improving accuracy in unseen object scenarios by leveraging geometric priors and a coarse-to-fine training strategy.
Contribution
The paper proposes a new ray bundling diffusion method for template-based pose estimation, integrating geometric priors and a coarse-to-fine training scheme for better unseen object pose inference.
Findings
Achieves competitive results on benchmark datasets.
Outperforms existing methods in unseen object pose estimation.
Demonstrates robustness with geometric priors and dense translation modeling.
Abstract
Typical template-based object pose pipelines estimate the pose by retrieving the closest matching template and aligning it with the observed image. However, failure to retrieve the correct template often leads to inaccurate pose predictions. To address this, we reformulate template-based object pose estimation as a ray alignment problem, where the viewing directions from multiple posed template images are learned to align with a non-posed query image. Inspired by recent progress in diffusion-based camera pose estimation, we embed this formulation into a diffusion transformer architecture that aligns a query image with a set of posed templates. We reparameterize object rotation using object-centered camera rays and model object translation by extending scale-invariant translation estimation to dense translation offsets. Our model leverages geometric priors from the templates to guide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization
