C-BEV: Contrastive Bird's Eye View Training for Cross-View Image   Retrieval and 3-DoF Pose Estimation

Florian Fervers; Sebastian Bullinger; Christoph Bodensteiner; Michael; Arens; Rainer Stiefelhagen

arXiv:2312.08060·cs.CV·December 14, 2023·2 cites

C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation

Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, Michael, Arens, Rainer Stiefelhagen

PDF

Open Access

TL;DR

This paper introduces C-BEV, a contrastive learning approach using bird's eye view maps for cross-view image retrieval and 3-DoF pose estimation, significantly improving accuracy in challenging real-world scenarios.

Contribution

It proposes a novel BEV-based retrieval architecture that addresses many-to-one ambiguities and learns 3-DoF pose estimation without explicit metric supervision.

Findings

01

Surpasses state-of-the-art in cross-view retrieval accuracy.

02

Doubles top-1 recall in challenging scenarios.

03

Learns accurate 3-DoF pose estimation without direct groundtruth.

Abstract

To find the geolocation of a street-view image, cross-view geolocalization (CVGL) methods typically perform image retrieval on a database of georeferenced aerial images and determine the location from the visually most similar match. Recent approaches focus mainly on settings where street-view and aerial images are preselected to align w.r.t. translation or orientation, but struggle in challenging real-world scenarios where varying camera poses have to be matched to the same aerial image. We propose a novel trainable retrieval architecture that uses bird's eye view (BEV) maps rather than vectors as embedding representation, and explicitly addresses the many-to-one ambiguity that arises in real-world scenarios. The BEV-based retrieval is trained using the same contrastive setting and loss as classical retrieval. Our method C-BEV surpasses the state-of-the-art on the retrieval task on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications

MethodsFocus · ALIGN