Cross-Modal RGB-D Fusion Transformer for 6D Pose Estimation of Non-Cooperative Spacecraft with Stereo-Derived Depth

Yongliang Zhen; Bo L\"U; Hang Yang; and Xiaotian WU

arXiv:2605.08592·cs.CV·May 12, 2026

Cross-Modal RGB-D Fusion Transformer for 6D Pose Estimation of Non-Cooperative Spacecraft with Stereo-Derived Depth

Yongliang Zhen, Bo L\"U, Hang Yang, and Xiaotian WU

PDF

TL;DR

This paper presents a passive stereo vision framework with a novel fusion Transformer for accurate 6D spacecraft pose estimation in space's challenging visual conditions.

Contribution

It introduces a stereo matching network and a cross-modal fusion Transformer tailored for space imagery, improving pose estimation robustness.

Findings

01

TSCA-Stereo outperforms baseline in all metrics on space-specific dataset.

02

Achieves mean translation error of 0.0419 m and orientation error of 0.8632°.

03

Passive stereo approach proves effective under harsh space lighting conditions.

Abstract

On-orbit servicing and active debris removal involving non-cooperative spacecraft require reliable pose estimation to supply accurate position and orientation data for autonomous visual navigation. Learning-based monocular methods have seen widespread adoption in spacecraft pose estimation, yet they suffer from an intrinsic depth ambiguity problem and tend to fail under the harsh illumination conditions routinely encountered in orbit. Active depth sensors could in principle address the geometric ambiguity, but their power and mass requirements make them poorly suited to most spacecraft platforms. This work addresses these issues through a passive stereo vision framework for six-degree-of-freedom (6-DOF) pose estimation of non-cooperative spacecraft. A binocular stereo matching network called TSCA-Stereo is developed to cope with weak-texture surfaces, specular highlights, and severe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.