T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression

Arash Amini; Arul Selvam Periyasamy; and Sven Behnke

arXiv:2109.10948·cs.CV·September 24, 2021

T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression

Arash Amini, Arul Selvam Periyasamy, and Sven Behnke

PDF

TL;DR

T6D-Direct introduces a transformer-based, real-time, single-stage approach for multi-object 6D pose estimation that achieves competitive accuracy with the fastest inference times on the YCB-Video dataset.

Contribution

It adapts the DETR transformer architecture for direct 6D pose regression, enabling efficient multi-object pose estimation without traditional detection components.

Findings

01

Achieves real-time inference speed.

02

Provides pose estimation accuracy comparable to state-of-the-art.

03

Demonstrates effectiveness on the YCB-Video dataset.

Abstract

6D pose estimation is the task of predicting the translation and orientation of objects in a given input image, which is a crucial prerequisite for many robotics and augmented reality applications. Lately, the Transformer Network architecture, equipped with a multi-head self-attention mechanism, is emerging to achieve state-of-the-art results in many computer vision tasks. DETR, a Transformer-based model, formulated object detection as a set prediction problem and achieved impressive results without standard components like region of interest pooling, non-maximal suppression, and bounding box proposals. In this work, we propose T6D-Direct, a real-time single-stage direct method with a transformer-based architecture built on DETR to perform 6D multi-object pose direct estimation. We evaluate the performance of our method on the YCB-Video dataset. Our method achieves the fastest inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Label Smoothing · Residual Connection · Layer Normalization · Position-Wise Feed-Forward Layer · Convolution