Cov2Pose: Leveraging Spatial Covariance for Direct Manifold-aware 6-DoF Object Pose Estimation

Nassim Ali Ousalah; Peyman Rostami; Vincent Gaudilli\`ere; Emmanuel Koumandakis; Anis Kacem; Enjie Ghorbel; Djamila Aouada

arXiv:2603.19961·cs.CV·March 27, 2026

Cov2Pose: Leveraging Spatial Covariance for Direct Manifold-aware 6-DoF Object Pose Estimation

Nassim Ali Ousalah, Peyman Rostami, Vincent Gaudilli\`ere, Emmanuel Koumandakis, Anis Kacem, Enjie Ghorbel, Djamila Aouada

PDF

Open Access

TL;DR

This paper introduces Cov2Pose, a novel method for 6-DoF object pose estimation from a single RGB image that leverages spatial covariance and manifold-aware regression to improve accuracy and robustness over traditional direct methods.

Contribution

It proposes a covariance-pooled representation and a manifold-aware network head for direct pose regression, incorporating second-order statistics and continuous pose encoding.

Findings

01

Improved accuracy in pose estimation, especially under partial occlusion.

02

Demonstrates the effectiveness of second-order pooling and SPD matrix representations.

03

Outperforms existing direct regression methods in experiments.

Abstract

In this paper, we address the problem of 6-DoF object pose estimation from a single RGB image. Indirect methods that typically predict intermediate 2D keypoints, followed by a Perspective-n-Point solver, have shown great performance. Direct approaches, which regress the pose in an end-to-end manner, are usually computationally more efficient but less accurate. However, direct pose regression heads rely on globally pooled features, ignoring spatial second-order statistics despite their informativeness in pose prediction. They also predict, in most cases, discontinuous pose representations that lack robustness. Herein, we therefore propose a covariance-pooled representation that encodes convolutional feature distributions as a symmetric positive definite (SPD) matrix. Moreover, we propose a novel pose encoding in the form of an SPD matrix via its Cholesky decomposition. Pose is then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Robotics and Sensor-Based Localization