Spatially-Weighted CLIP for Street-View Geo-localization

Ting Han; Fengjiao Li; Chunsong Chen; Haoling Huang; Yiping Chen; Meiliu Wu

arXiv:2604.04357·cs.CV·April 7, 2026

Spatially-Weighted CLIP for Street-View Geo-localization

Ting Han, Fengjiao Li, Chunsong Chen, Haoling Huang, Yiping Chen, Meiliu Wu

PDF

TL;DR

SW-CLIP introduces a spatially-aware contrastive learning framework for street-view geo-localization, leveraging geographic relationships to improve accuracy and spatial coherence over traditional CLIP methods.

Contribution

The paper presents SW-CLIP, which incorporates spatial autocorrelation into vision-language contrastive learning using distance-aware supervision and neighborhood regularization.

Findings

01

SW-CLIP outperforms standard CLIP in geo-localization accuracy.

02

It reduces long-tail localization errors.

03

Enhances spatial coherence in embedding space.

Abstract

This paper proposes Spatially-Weighted CLIP (SW-CLIP), a novel framework for street-view geo-localization that explicitly incorporates spatial autocorrelation into vision-language contrastive learning. Unlike conventional CLIP-based methods that treat all non-matching samples as equally negative, SW-CLIP leverages Tobler's First Law of Geography to model geographic relationships through distance-aware soft supervision. Specifically, we introduce a location-as-text representation to encode geographic positions and replace one-hot InfoNCE targets with spatially weighted soft labels derived from geodesic distance. Additionally, a neighborhood-consistency regularization is employed to preserve local spatial structure in the embedding space. Experiments on a multi-city dataset demonstrate that SW-CLIP significantly improves geo-localization accuracy, reduces long-tail errors, and enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.