MoveGPT: Scaling Mobility Foundation Models with Spatially-Aware Mixture of Experts
Chonghua Han, Yuan Yuan, Jingtao Ding, Jie Feng, Fanjin Meng, Yong Li

TL;DR
MoveGPT introduces a large-scale, spatially-aware foundation model for human mobility that effectively captures diverse movement patterns and generalizes well across different cities, marking a significant advancement in mobility modeling.
Contribution
The paper presents MoveGPT, a novel mobility foundation model with a unified location encoder and spatially-aware mixture-of-experts, enabling scalable pre-training and improved pattern recognition.
Findings
Achieves up to 35% performance improvement on downstream tasks.
Demonstrates strong generalization to unseen cities.
Validates the scalability of foundation models in human mobility.
Abstract
The success of foundation models in language has inspired a new wave of general-purpose models for human mobility. However, existing approaches struggle to scale effectively due to two fundamental limitations: a failure to use meaningful basic units to represent movement, and an inability to capture the vast diversity of patterns found in large-scale data. In this work, we develop MoveGPT, a large-scale foundation model specifically architected to overcome these barriers. MoveGPT is built upon two key innovations: (1) a unified location encoder that maps geographically disjoint locations into a shared semantic space, enabling pre-training on a global scale; and (2) a Spatially-Aware Mixture-of-Experts Transformer that develops specialized experts to efficiently capture diverse mobility patterns. Pre-trained on billion-scale datasets, MoveGPT establishes a new state-of-the-art across a…
Peer Reviews
Decision·Submitted to ICLR 2026
1. This paper proposes a novel location encoder that maps the geographic locations of different cities into a shared semantic space, overcoming the obstacle of previous models that rely on cities and are not scalable across cities. 2. The proposed model employs a spatially aware mixture of expert Transformer architecture to efficiently capture diverse movement patterns driven by different intentions. 3. MoveGPT achieves new levels of SOTA in a wide range of downstream tasks, including prediction
1. The paper claims that its model has global scale and universal capabilities, but its pre-training dataset consists entirely of 16 U.S. cities. The movement patterns of US cities differ significantly from those in Europe or elsewhere, and are likely to be “US models” with no proven ability to generalize to other diverse urban environments around the world. 2. The proposed model maps the entire city to a 500mx500m grid, which is a coarser resolution and will inevitably lose information. Because
The three main strengths are the following ones: Strength 1: The paper tackles problems that are extremely relevant for society. For instance, human mobility is known to be linked with disease diffusion, pollution, economic growth, and many other factors. Strength 2: The model is carefully designed, and the idea of having a SAMoE handling different aspects of a city and human behaviors is appealing and well presented. Strength 3: The model is evaluated on a diverse range of several different
The main weaknesses of the paper are the following ones: Weaknesses 1: Many relevant pieces of information are not present in the main text and are presented only in the appendix, making the paper inconsistent and difficult to evaluate by looking only at the main content (see review policies). For instance, datasets used for training and evaluation are never presented in the main text. Additionally, datasets bring with them self-selection biases that, in cases like mobility, should not be under
1. The paper introduces the idea of applying Mixture-of-Experts (MoE) to the domain of mobility foundation models, which is relatively novel. 2. The authors conduct extensive experiments, demonstrating SOTA performance on multiple benchmarks.
1. Overclaiming novelty. The authors repeatedly emphasize their unified location encoder, but similar ideas have already appeared in prior works such as GeoCLIP [1], which also trained a unified location encoder and further extended it to downstream models like UrbanVLP [2]. MoveGPT’s encoder essentially encodes GPS coordinates with auxiliary features such as POIs, which is not a fundamentally new innovation. 2. Manual expert definition in MoE. In large language models, MoE architectures typica
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Mobility and Location-Based Analysis · Geographic Information Systems Studies · Data-Driven Disease Surveillance
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing
