A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation

Alexander Du; Xiujin Liu

arXiv:2501.01993·cs.CV·January 8, 2026

A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation

Alexander Du, Xiujin Liu

PDF

Open Access

TL;DR

PoseLecTr introduces a graph-based encoder-decoder with Legendre convolution and attention mechanisms, significantly improving 6D object pose estimation from monocular RGB images, especially in cluttered or occluded scenes.

Contribution

It presents a novel graph-based framework with Legendre convolution and attention modules, addressing limitations of grid-structured convolutions in modeling complex spatial dependencies.

Findings

01

Achieves competitive performance on LINEMOD, Occluded LINEMOD, and YCB-VIDEO datasets.

02

Demonstrates consistent improvements across various objects and scene complexities.

03

Enhances feature modeling in cluttered or occluded environments.

Abstract

This paper proposes PoseLecTr, a graph-based encoder-decoder framework that integrates a novel Legendre convolution with attention mechanisms for six-degree-of-freedom (6-DOF) object pose estimation from monocular RGB images. Conventional learning-based approaches predominantly rely on grid-structured convolutions, which can limit their ability to model higher-order and long-range dependencies among image features, especially in cluttered or occluded scenes. PoseLecTr addresses this limitation by constructing a graph representation from image features, where spatial relationships are explicitly modeled through graph connectivity. The proposed framework incorporates a Legendre convolution layer to improve numerical stability in graph convolution, together with spatial-attention and self-attention distillation to enhance feature selection. Experiments conducted on the LINEMOD, Occluded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Hand Gesture Recognition Systems · Image and Object Detection Techniques

MethodsSoftmax · Attention Is All You Need · Convolution · Feature Selection