C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation
Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, Michael, Arens, Rainer Stiefelhagen

TL;DR
This paper introduces C-BEV, a contrastive learning approach using bird's eye view maps for cross-view image retrieval and 3-DoF pose estimation, significantly improving accuracy in challenging real-world scenarios.
Contribution
It proposes a novel BEV-based retrieval architecture that addresses many-to-one ambiguities and learns 3-DoF pose estimation without explicit metric supervision.
Findings
Surpasses state-of-the-art in cross-view retrieval accuracy.
Doubles top-1 recall in challenging scenarios.
Learns accurate 3-DoF pose estimation without direct groundtruth.
Abstract
To find the geolocation of a street-view image, cross-view geolocalization (CVGL) methods typically perform image retrieval on a database of georeferenced aerial images and determine the location from the visually most similar match. Recent approaches focus mainly on settings where street-view and aerial images are preselected to align w.r.t. translation or orientation, but struggle in challenging real-world scenarios where varying camera poses have to be matched to the same aerial image. We propose a novel trainable retrieval architecture that uses bird's eye view (BEV) maps rather than vectors as embedding representation, and explicitly addresses the many-to-one ambiguity that arises in real-world scenarios. The BEV-based retrieval is trained using the same contrastive setting and loss as classical retrieval. Our method C-BEV surpasses the state-of-the-art on the retrieval task on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
MethodsFocus · ALIGN
