Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization
Zhenbo Song, Xianghui Ze, Jianfeng Lu, Yujiao Shi

TL;DR
This paper introduces a novel end-to-end learning approach for highly accurate cross-view camera localization by estimating dense pixel-wise flow fields between ground and satellite images, significantly improving localization accuracy.
Contribution
The method uniquely constructs pixel-level feature metrics and employs a BEV projection with residual refinement for better geometric alignment, advancing cross-view localization techniques.
Findings
Reduces median localization error by 89% on KITTI dataset
Achieves 19% error reduction on Ford multi-AV dataset
Outperforms state-of-the-art methods across multiple datasets
Abstract
This paper addresses the problem of estimating the 3-DoF camera pose for a ground-level image with respect to a satellite image that encompasses the local surroundings. We propose a novel end-to-end approach that leverages the learning of dense pixel-wise flow fields in pairs of ground and satellite images to calculate the camera pose. Our approach differs from existing methods by constructing the feature metric at the pixel level, enabling full-image supervision for learning distinctive geometric configurations and visual appearances across views. Specifically, our method employs two distinct convolution networks for ground and satellite feature extraction. Then, we project the ground feature map to the bird's eye view (BEV) using a fixed camera height assumption to achieve preliminary geometric alignment. To further establish content association between the BEV and satellite features,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
MethodsConvolution
