PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning

Jianqi Chen; Biao Zhang; Xiangjun Tang; Peter Wonka

arXiv:2512.10840·cs.CV·December 12, 2025

PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning

Jianqi Chen, Biao Zhang, Xiangjun Tang, Peter Wonka

PDF

Open Access

TL;DR

PoseGAM introduces a geometry-aware multi-view framework for robust 6D object pose estimation of unseen objects, eliminating explicit feature matching and leveraging synthetic data for improved generalization.

Contribution

It proposes a novel multi-view approach that directly predicts object pose using geometry information without explicit matching, enhancing robustness and generalization.

Findings

01

Achieves state-of-the-art performance with 5.1% AR improvement.

02

Demonstrates strong generalization to unseen objects.

03

Constructed a large synthetic dataset with 190k objects.

Abstract

6D object pose estimation, which predicts the transformation of an object relative to the camera, remains challenging for unseen objects. Existing approaches typically rely on explicitly constructing feature correspondences between the query image and either the object model or template images. In this work, we propose PoseGAM, a geometry-aware multi-view framework that directly predicts object pose from a query image and multiple template images, eliminating the need for explicit matching. Built upon recent multi-view-based foundation model architectures, the method integrates object geometry information through two complementary mechanisms: explicit point-based geometry and learned features from geometry representation networks. In addition, we construct a large-scale synthetic dataset containing more than 190k objects under diverse environmental conditions to enhance robustness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications