Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference

Yuan Gao; Yajing Luo; Junhong Wang; Kui Jia; Gui-Song Xia

arXiv:2406.18453·cs.CV·September 3, 2025

Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference

Yuan Gao, Yajing Luo, Junhong Wang, Kui Jia, Gui-Song Xia

PDF

Open Access 1 Repo

TL;DR

This paper introduces a training-free, generalizable 3D relative pose estimation method that uses a single RGB-D reference and differentiable rendering, outperforming supervised methods on standard benchmarks.

Contribution

The proposed approach leverages 3D shape perception, render-and-compare, and semantic cues from pretrained models to estimate relative pose without training on specific objects.

Findings

01

Outperforms state-of-the-art supervised methods on multiple datasets.

02

Works effectively on unseen objects with only a single RGB-D reference.

03

Achieves high accuracy under strict angular error metrics.

Abstract

Humans can easily deduce the relative pose of a previously unseen object, without labeling or training, given only a single query-reference image pair. This is arguably achieved by incorporating i) 3D/2.5D shape perception from a single image, ii) render-and-compare simulation, and iii) rich semantic cue awareness to furnish (coarse) reference-query correspondence. Motivated by this, we propose a novel 3D generalizable relative pose estimation method by elaborating 3D/2.5D shape perception with a 2.5D shape from an RGB-D reference, fulfilling the render-and-compare paradigm with an off-the-shelf differentiable renderer, and leveraging the semantic cues from a pretrained model like DINOv2. Specifically, our differentiable renderer takes the 2.5D rotatable mesh textured by the RGB and the semantic maps (obtained by DINOv2 from the RGB input), then renders new RGB and semantic maps (with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ethanygao/training-free_generalizable_relative_pose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · 3D Shape Modeling and Analysis