Eyes Will Shut: A Vision-Based Next GPS Location Prediction Model by Reinforcement Learning from Visual Map Feed Back

Ruixing Zhang; Yang Zhang; Tongyu Zhu; Leilei Sun; Weifeng Lv

arXiv:2507.18661·cs.CV·August 5, 2025

Eyes Will Shut: A Vision-Based Next GPS Location Prediction Model by Reinforcement Learning from Visual Map Feed Back

Ruixing Zhang, Yang Zhang, Tongyu Zhu, Leilei Sun, Weifeng Lv

PDF

Open Access

TL;DR

This paper introduces VLMLocPredictor, a vision-based model that predicts next locations in human mobility trajectories by leveraging visual reasoning and reinforcement learning on map images, achieving state-of-the-art results.

Contribution

It proposes a novel vision-based approach using VLMs with reinforcement learning for next location prediction, mimicking human reasoning over maps.

Findings

01

Achieves state-of-the-art performance on multiple city datasets.

02

Demonstrates superior cross-city generalization.

03

Enables models to reason over maps similarly to humans.

Abstract

Next Location Prediction is a fundamental task in the study of human mobility, with wide-ranging applications in transportation planning, urban governance, and epidemic forecasting. In practice, when humans attempt to predict the next location in a trajectory, they often visualize the trajectory on a map and reason based on road connectivity and movement trends. However, the vast majority of existing next-location prediction models do not reason over maps \textbf{in the way that humans do}. Fortunately, the recent development of Vision-Language Models (VLMs) has demonstrated strong capabilities in visual perception and even visual reasoning. This opens up a new possibility: by rendering both the road network and trajectory onto an image and leveraging the reasoning abilities of VLMs, we can enable models to perform trajectory inference in a human-like manner. To explore this idea, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Mobility and Location-Based Analysis · Automated Road and Building Extraction · Data Management and Algorithms