# Pose-Perceptive Convolution: Learning Geometry-Adaptive Receptive Fields for Robust 6D Pose Estimation

**Authors:** Yi Lai, Yaqing Song, Qixian Zhang, Yue Wang, Kang An, Hui Zhang

PMC · DOI: 10.3390/s26020453 · Sensors (Basel, Switzerland) · 2026-01-09

## TL;DR

This paper introduces a new convolution method that adapts to object shapes for better 6D pose estimation in robotics and AR.

## Contribution

The novel Pose-Perceptive Convolution dynamically adjusts receptive fields to resolve geometric mismatches in pose estimation.

## Key findings

- PPF-Net improves VSD score by 19.4% over FFB6D on MP6D benchmark.
- Achieves 96.7% ADD-S accuracy on YCB-Video, near state-of-the-art.
- Minimal computational overhead compared to backend-heavy methods.

## Abstract

6D object pose estimation is crucial for applications such as robotic manipulation and augmented reality, yet it remains highly challenging when dealing with objects of significantly different aspect ratios or the drastic appearance variations of a single object caused by pose changes. Most existing methods focus on designing more complex backend fusion modules, while largely overlooking a fundamental problem at the feature extraction frontend: the geometric mismatch between the fixed, square receptive fields of standard convolutions and the varied projected morphologies of objects. This mismatch, along with noise in fused features and ambiguity in regression, limits the performance ceiling of current methods. To this end, this paper proposes a novel Pose-Perceptive Convolution (PPC) and constructs a new Pose-Perceptive Fusion Network (PPF-Net). Its core component, the Pose-Perceptive Convolution, fundamentally resolves the aforementioned geometric mismatch by dynamically adapting the shape and sampling density of its receptive field. Experiments on four benchmarks show that PPF-Net improves the VSD score by 19.4% over FFB6D on MP6D, and achieves 96.7% ADD-S on YCB-Video, approaching state-of-the-art accuracy. Crucially, these gains are realized with minimal computational overhead, avoiding the heavy latency of backend-intensive approaches. This validates that frontend feature extraction is an efficient strategy for robust 6D pose estimation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12845661/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12845661/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/PMC12845661/full.md

---
Source: https://tomesphere.com/paper/PMC12845661