TransPose: A Transformer-based 6D Object Pose Estimation Network with Depth Refinement
Mahmoud Abdulsalam, Nabil Aouf

TL;DR
TransPose is a Transformer-based 6D object pose estimation network that uses RGB images and a depth refinement module to achieve superior accuracy in robotics and agricultural applications.
Contribution
The paper introduces TransPose, a novel Transformer-based architecture with a lightweight depth estimation and refinement module for improved 6D pose estimation from RGB images.
Findings
Outperforms state-of-the-art methods in 6D pose estimation
Effective in fruit-picking robotic applications
Achieves higher accuracy with RGB-only input
Abstract
As demand for robotics manipulation application increases, accurate vision-based 6D pose estimation becomes essential for autonomous operations. Convolutional Neural Networks (CNNs) based approaches for pose estimation have been previously introduced. However, the quest for better performance still persists especially for accurate robotics manipulation. This quest extends to the Agri-robotics domain. In this paper, we propose TransPose, an improved Transformer-based 6D pose estimation with a depth refinement module. The architecture takes in only an RGB image as input with no additional supplementing modalities such as depth or thermal images. The architecture encompasses an innovative lighter depth estimation network that estimates depth from an RGB image using feature pyramid with an up-sampling method. A transformer-based detection network with additional prediction heads is proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Advanced Vision and Imaging · Industrial Vision Systems and Defect Detection
