A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation
Alexander Du, Xiujin Liu

TL;DR
PoseLecTr introduces a graph-based encoder-decoder with Legendre convolution and attention mechanisms, significantly improving 6D object pose estimation from monocular RGB images, especially in cluttered or occluded scenes.
Contribution
It presents a novel graph-based framework with Legendre convolution and attention modules, addressing limitations of grid-structured convolutions in modeling complex spatial dependencies.
Findings
Achieves competitive performance on LINEMOD, Occluded LINEMOD, and YCB-VIDEO datasets.
Demonstrates consistent improvements across various objects and scene complexities.
Enhances feature modeling in cluttered or occluded environments.
Abstract
This paper proposes PoseLecTr, a graph-based encoder-decoder framework that integrates a novel Legendre convolution with attention mechanisms for six-degree-of-freedom (6-DOF) object pose estimation from monocular RGB images. Conventional learning-based approaches predominantly rely on grid-structured convolutions, which can limit their ability to model higher-order and long-range dependencies among image features, especially in cluttered or occluded scenes. PoseLecTr addresses this limitation by constructing a graph representation from image features, where spatial relationships are explicitly modeled through graph connectivity. The proposed framework incorporates a Legendre convolution layer to improve numerical stability in graph convolution, together with spatial-attention and self-attention distillation to enhance feature selection. Experiments conducted on the LINEMOD, Occluded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Hand Gesture Recognition Systems · Image and Object Detection Techniques
MethodsSoftmax · Attention Is All You Need · Convolution · Feature Selection
