GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space

David G. Shatwell; Ishan Rajendrakumar Dave; Sirnam Swetha; Mubarak Shah

arXiv:2507.10473·cs.CV·July 29, 2025

GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space

David G. Shatwell, Ishan Rajendrakumar Dave, Sirnam Swetha, Mubarak Shah

PDF

Open Access

TL;DR

GT-Loc introduces a joint embedding space for images, time, and location, enabling improved timestamp prediction and geo-localization by modeling their interdependence with a novel temporal metric-learning approach.

Contribution

The paper presents a unified embedding framework that jointly predicts capture time and geo-location, utilizing a cyclical temporal metric learning objective for the first time.

Findings

01

Outperforms previous timestamp prediction methods.

02

Achieves competitive geo-localization results.

03

Enables compositional and text-based image retrieval.

Abstract

Timestamp prediction aims to determine when an image was captured using only visual information, supporting applications such as metadata correction, retrieval, and digital forensics. In outdoor scenarios, hourly estimates rely on cues like brightness, hue, and shadow positioning, while seasonal changes and weather inform date estimation. However, these visual cues significantly depend on geographic context, closely linking timestamp prediction to geo-localization. To address this interdependence, we introduce GT-Loc, a novel retrieval-based method that jointly predicts the capture time (hour and month) and geo-location (GPS coordinates) of an image. Our approach employs separate encoders for images, time, and location, aligning their embeddings within a shared high-dimensional feature space. Recognizing the cyclical nature of time, instead of conventional contrastive learning with hard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging