CVD-SfM: A Cross-View Deep Front-end Structure-from-Motion System for Sparse Localization in Multi-Altitude Scenes

Yaxuan Li; Yewei Huang; Bijay Gaudel; Hamidreza Jafarnejadsani; and Brendan Englot

arXiv:2508.01936·cs.CV·August 14, 2025

CVD-SfM: A Cross-View Deep Front-end Structure-from-Motion System for Sparse Localization in Multi-Altitude Scenes

Yaxuan Li, Yewei Huang, Bijay Gaudel, Hamidreza Jafarnejadsani, and Brendan Englot

PDF

Open Access

TL;DR

This paper introduces CVD-SfM, a novel cross-view deep structure-from-motion system designed for accurate sparse localization across multi-altitude scenes, validated on new datasets and outperforming existing methods.

Contribution

The paper presents a new multi-altitude pose estimation system integrating cross-view transformers and structure-from-motion, along with newly collected datasets for benchmarking.

Findings

01

Achieves superior accuracy and robustness in multi-altitude sparse pose estimation.

02

Demonstrates effectiveness across diverse environmental conditions and viewpoints.

03

Provides new datasets to benchmark multi-altitude localization methods.

Abstract

We present a novel multi-altitude camera pose estimation system, addressing the challenges of robust and accurate localization across varied altitudes when only considering sparse image input. The system effectively handles diverse environmental conditions and viewpoint variations by integrating the cross-view transformer, deep features, and structure-from-motion into a unified framework. To benchmark our method and foster further research, we introduce two newly collected datasets specifically tailored for multi-altitude camera pose estimation; datasets of this nature remain rare in the current literature. The proposed framework has been validated through extensive comparative analyses on these datasets, demonstrating that our system achieves superior performance in both accuracy and robustness for multi-altitude sparse pose estimation tasks compared to existing solutions, making it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging