Cross-Modal RGB-D Fusion Transformer for 6D Pose Estimation of Non-Cooperative Spacecraft with Stereo-Derived Depth
Yongliang Zhen, Bo L\"U, Hang Yang, and Xiaotian WU

TL;DR
This paper presents a passive stereo vision framework with a novel fusion Transformer for accurate 6D spacecraft pose estimation in space's challenging visual conditions.
Contribution
It introduces a stereo matching network and a cross-modal fusion Transformer tailored for space imagery, improving pose estimation robustness.
Findings
TSCA-Stereo outperforms baseline in all metrics on space-specific dataset.
Achieves mean translation error of 0.0419 m and orientation error of 0.8632°.
Passive stereo approach proves effective under harsh space lighting conditions.
Abstract
On-orbit servicing and active debris removal involving non-cooperative spacecraft require reliable pose estimation to supply accurate position and orientation data for autonomous visual navigation. Learning-based monocular methods have seen widespread adoption in spacecraft pose estimation, yet they suffer from an intrinsic depth ambiguity problem and tend to fail under the harsh illumination conditions routinely encountered in orbit. Active depth sensors could in principle address the geometric ambiguity, but their power and mass requirements make them poorly suited to most spacecraft platforms. This work addresses these issues through a passive stereo vision framework for six-degree-of-freedom (6-DOF) pose estimation of non-cooperative spacecraft. A binocular stereo matching network called TSCA-Stereo is developed to cope with weak-texture surfaces, specular highlights, and severe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
