Seeing the Unseen: Mask-Driven Positional Encoding and Strip-Convolution Context Modeling for Cross-View Object Geo-Localization

Shuhan Hu; Yiru Li; Yuanyuan Li; Yingying Zhu

arXiv:2510.20247·cs.CV·October 24, 2025

Seeing the Unseen: Mask-Driven Positional Encoding and Strip-Convolution Context Modeling for Cross-View Object Geo-Localization

Shuhan Hu, Yiru Li, Yuanyuan Li, Yingying Zhu

PDF

TL;DR

This paper introduces EDGeo, a novel framework for cross-view object geo-localization that uses mask-based positional encoding and strip convolutional context modeling to improve accuracy and robustness in challenging scenarios.

Contribution

It proposes a mask-based positional encoding scheme and a strip convolutional context module, advancing beyond keypoint-based methods for better shape and context understanding.

Findings

01

Achieves state-of-the-art localization accuracy on public datasets.

02

Improves robustness to annotation shifts and large-span objects.

03

Enhances feature discrimination with strip convolutional kernels.

Abstract

Cross-view object geo-localization enables high-precision object localization through cross-view matching, with critical applications in autonomous driving, urban management, and disaster response. However, existing methods rely on keypoint-based positional encoding, which captures only 2D coordinates while neglecting object shape information, resulting in sensitivity to annotation shifts and limited cross-view matching capability. To address these limitations, we propose a mask-based positional encoding scheme that leverages segmentation masks to capture both spatial coordinates and object silhouettes, thereby upgrading the model from "location-aware" to "object-aware." Furthermore, to tackle the challenge of large-span objects (e.g., elongated buildings) in satellite imagery, we design a context enhancement module. This module employs horizontal and vertical strip convolutional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.