TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer

Xiao Lin; Deming Wang; Guangliang Zhou; Chengju Liu; and Qijun Chen

arXiv:2310.16279·cs.CV·April 24, 2024·1 cites

TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer

Xiao Lin, Deming Wang, Guangliang Zhou, Chengju Liu, and Qijun Chen

PDF

Open Access

TL;DR

TransPose introduces a geometry-aware Transformer-based framework for 6D object pose estimation from RGB data, effectively handling occlusion and illumination variations by integrating local and global geometric features.

Contribution

The paper presents a novel Transformer encoder with a geometry-aware module that enhances point cloud feature learning for improved 6D pose estimation.

Findings

01

Achieves competitive results on benchmark datasets.

02

Effectively handles occlusion and illumination changes.

03

Integrates local geometry with global context via Transformer.

Abstract

Estimating the 6D object pose is an essential task in many applications. Due to the lack of depth information, existing RGB-based methods are sensitive to occlusion and illumination changes. How to extract and utilize the geometry features in depth information is crucial to achieve accurate predictions. To this end, we propose TransPose, a novel 6D pose framework that exploits Transformer Encoder with geometry-aware module to develop better learning of point cloud feature representations. Specifically, we first uniformly sample point cloud and extract local geometry features with the designed local feature extractor base on graph convolution network. To improve robustness to occlusion, we adopt Transformer to perform the exchange of global information, making each local feature contains global information. Finally, we introduce geometry-aware module in Transformer Encoder, which to form…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · 3D Shape Modeling and Analysis · Human Pose and Action Recognition

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Byte Pair Encoding · Dropout · Layer Normalization