VIRD: View-Invariant Representation through Dual-Axis Transformation for Cross-View Pose Estimation

Juhye Park; Wooju Lee; Dasol Hong; Changki Sung; Youngwoo Seo; Dongwan Kang; Hyun Myung

arXiv:2603.12918·cs.CV·March 24, 2026

VIRD: View-Invariant Representation through Dual-Axis Transformation for Cross-View Pose Estimation

Juhye Park, Wooju Lee, Dasol Hong, Changki Sung, Youngwoo Seo, Dongwan Kang, Hyun Myung

PDF

Open Access

TL;DR

This paper introduces VIRD, a novel method for cross-view pose estimation that constructs view-invariant representations through dual-axis transformation, significantly improving localization accuracy in autonomous driving scenarios.

Contribution

VIRD employs a dual-axis transformation and context-enhanced attention to bridge the viewpoint gap, with a view-reconstruction loss to enhance view invariance, advancing cross-view pose estimation techniques.

Findings

01

Outperforms state-of-the-art methods on KITTI and VIGOR datasets.

02

Reduces median position errors by over 50%.

03

Reduces median orientation errors by over 75%.

Abstract

Accurate global localization is critical for autonomous driving and robotics, but GNSS-based approaches often degrade due to occlusion and multipath effects. As an emerging alternative, cross-view pose estimation predicts the 3-DoF camera pose corresponding to a ground-view image with respect to a geo-referenced satellite image. However, existing methods struggle to bridge the significant viewpoint gap between the ground and satellite views mainly due to limited spatial correspondences. We propose a novel cross-view pose estimation method that constructs view-invariant representations through dual-axis transformation (VIRD). VIRD first applies a polar transformation to the satellite view to facilitate horizontal correspondence, then uses context-enhanced positional attention on the ground and polar-transformed satellite features to mitigate vertical misalignment, explicitly bridging the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques