HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation

Hari Krishna Gadi; Daniel Matos; Hongyi Luo; Lu Liu; Yongliang Wang; Yanfeng Zhang; Liqiu Meng

arXiv:2601.23064·cs.CV·March 3, 2026

HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation

Hari Krishna Gadi, Daniel Matos, Hongyi Luo, Lu Liu, Yongliang Wang, Yanfeng Zhang, Liqiu Meng

PDF

Open Access 3 Reviews

TL;DR

HierLoc introduces a hierarchical hyperbolic embedding approach for visual geolocation, enabling scalable, interpretable, and accurate predictions by embedding geographic entities directly, outperforming existing methods on the OSV5M benchmark.

Contribution

The paper presents a novel entity-centric hierarchical hyperbolic embedding model for geolocation, replacing image retrieval with geographic entity embeddings, achieving state-of-the-art accuracy and efficiency.

Findings

01

Reduces mean geodesic error by 19.5%

02

Improves subregion accuracy by 43%

03

Uses 240k entity embeddings instead of 5 million image embeddings

Abstract

Visual geolocalization, the task of predicting where an image was taken, remains challenging due to global scale, visual ambiguity, and the inherently hierarchical structure of geography. Existing paradigms rely on either large-scale retrieval, which requires storing a large number of image embeddings, grid-based classifiers that ignore geographic continuity, or generative models that diffuse over space but struggle with fine detail. We introduce an entity-centric formulation of geolocation that replaces image-to-image retrieval with a compact hierarchy of geographic entities embedded in Hyperbolic space. Images are aligned directly to country, region, subregion, and city entities through Geo-Weighted Hyperbolic contrastive learning by directly incorporating haversine distance into the contrastive objective. This hierarchical design enables interpretable predictions and efficient…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 5

Strengths

1. The paper showcases **extensive experiments** to back up its claims. 2. The method's inference structure allows for **very efficient inference**. 3. The method is showcased on two datasets: **OSV-5M** (street view focused) and **MediaEval** (more generalist). 4. The authors achieve **SOTA performances on OSV-5M**.

Weaknesses

1. **Data Curation:** It is not clear, but **entities seem to be learned *across* datasets** (per Section 3.2). If true, this **makes the results not comparable**, as the model is trained on significantly more data. This also potentially **breaks the data decontamination** for OSV-5M (1km exclusion zone). I would like to see **results with entities computed separately for each dataset**. 2. **Mean Embeddings for Large Regions:** Taking the mean embedding may be suitable for fine-grained regions

Reviewer 02Rating 2Confidence 4

Strengths

1. A hierarchical representation of locations and a beam search through hierarchical entities is a very natural and geographically sound alternative to traditional single-location-based retrieval geolocation methods. 2. Using hyperbolic embedding to represent hierarchy is very efficient. 3. Geo-Weighted InfoNCE introduces geo-awareness into the contrastive learning loss.

Weaknesses

1. The key weakness, which severely restricts the generalizability of the method, is that the proposed entity hierarchy relies heavily on the **coverage** of the dataset. That is, if a location in the test set never appears in any neighborhoods of an entity in the training dataset, the model will never be able to predict its location. This is no rare case -- both MP16 and OSV5M are highly spatially biased, i.e. most data concentrate in North America and Western Europe. This is already a known pr

Reviewer 03Rating 4Confidence 4

Strengths

1. The paper is clearly presented, and the reader can smoothly follow the authors’ logic. 2. The idea is relatively novel, and alignment itself is a challenging problem.

Weaknesses

1. The related work section misses several relevant studies (e.g., GeoReasoner [1] , Img2Loc [2]). I recommend that the authors do a more thorough survey. 2. In Table 2, which backbone is used for HierLoc? Is the comparison with other methods fair? 3. The code and data will only be released after acceptance. 4. Similar to the first point, the experimental section also lacks comparisons with strong baselines. The current results still show a gap from the state-of-the-art on some metrics. Overall

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Graph Neural Networks