ConceptPose: Training-Free Zero-Shot Object Pose Estimation using Concept Vectors

Liming Kuang; Yordanka Velikova; Mahdi Saleh; Jan-Nico Zaech; Danda Pani Paudel; Benjamin Busam

arXiv:2512.09056·cs.CV·April 14, 2026

ConceptPose: Training-Free Zero-Shot Object Pose Estimation using Concept Vectors

Liming Kuang, Yordanka Velikova, Mahdi Saleh, Jan-Nico Zaech, Danda Pani Paudel, Benjamin Busam

PDF

TL;DR

ConceptPose introduces a training-free, zero-shot object pose estimation method that leverages vision-language models to create open-vocabulary 3D concept maps for accurate 6DoF pose estimation.

Contribution

It presents a novel framework that uses vision-language models to perform object pose estimation without any training on specific datasets or objects.

Findings

01

Achieves state-of-the-art zero-shot relative pose estimation results.

02

Outperforms dataset-specific methods by 62% in average ADD(-S) score.

03

Operates without any object or dataset-specific training.

Abstract

Object pose estimation is a fundamental task in computer vision and robotics, yet most methods require extensive, dataset-specific training. Concurrently, large-scale vision language models show remarkable zero-shot capabilities. In this work, we bridge these two worlds by introducing ConceptPose, a framework for object pose estimation that is both training-free and model-free. ConceptPose leverages a vision-language-model (VLM) to create open-vocabulary 3D concept maps, where each point is tagged with a concept vector derived from saliency maps. By establishing robust 3D-3D correspondences across concept maps, our approach allows precise estimation of 6DoF relative pose. Without any object or dataset-specific training, our approach achieves state-of-the-art results on common zero shot relative pose estimation benchmarks, outperforming the strongest baseline by a relative 62\% in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.