AddressCLIP: Empowering Vision-Language Models for City-wide Image   Address Localization

Shixiong Xu; Chenghao Zhang; Lubin Fan; Gaofeng Meng; Shiming Xiang,; Jieping Ye

arXiv:2407.08156·cs.CV·July 12, 2024·1 cites

AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization

Shixiong Xu, Chenghao Zhang, Lubin Fan, Gaofeng Meng, Shiming Xiang,, Jieping Ye

PDF

Open Access 1 Repo

TL;DR

AddressCLIP introduces an end-to-end vision-language framework for city-wide image address localization, leveraging image-text alignment and spatial constraints to improve accuracy over traditional methods.

Contribution

The paper presents AddressCLIP, a novel end-to-end approach for image address localization that combines contrastive learning with spatial manifold constraints, and provides new datasets for this task.

Findings

01

Outperforms existing transfer learning methods on new datasets

02

Achieves high accuracy in city-wide address localization

03

Demonstrates effectiveness through extensive ablations and visualizations

Abstract

In this study, we introduce a new problem raised by social media and photojournalism, named Image Address Localization (IAL), which aims to predict the readable textual address where an image was taken. Existing two-stage approaches involve predicting geographical coordinates and converting them into human-readable addresses, which can lead to ambiguity and be resource-intensive. In contrast, we propose an end-to-end framework named AddressCLIP to solve the problem with more semantics, consisting of two key ingredients: i) image-text alignment to align images with addresses and scene captions by contrastive learning, and ii) image-geography matching to constrain image features with the spatial distance in terms of manifold learning. Additionally, we have built three datasets from Pittsburgh and San Francisco on different scales specifically for the IAL problem. Experiments demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xsx1001/addressclip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Indoor and Outdoor Localization Technologies

MethodsALIGN