TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series

Xiaolei Qin; Di Wang; Jing Zhang; Fengxiang Wang; Xin Su; Bo Du; Liangpei Zhang

arXiv:2505.08723·cs.CV·May 14, 2025

TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series

Xiaolei Qin, Di Wang, Jing Zhang, Fengxiang Wang, Xin Su, Bo Du, Liangpei Zhang

PDF

1 Repo

TL;DR

TiMo is a hierarchical vision transformer model designed for satellite image time series, capturing multiscale spatiotemporal patterns to improve various Earth observation tasks.

Contribution

We introduce TiMo, a novel hierarchical transformer with a spatiotemporal gyroscope attention mechanism and a large-scale pre-training dataset, enhancing SITS analysis capabilities.

Findings

01

TiMo outperforms state-of-the-art methods in multiple tasks.

02

Pre-training on MillionST improves generalization.

03

Effective modeling of multiscale spatiotemporal patterns.

Abstract

Satellite image time series (SITS) provide continuous observations of the Earth's surface, making them essential for applications such as environmental management and disaster assessment. However, existing spatiotemporal foundation models rely on plain vision transformers, which encode entire temporal sequences without explicitly capturing multiscale spatiotemporal relationships between land objects. This limitation hinders their effectiveness in downstream tasks. To overcome this challenge, we propose TiMo, a novel hierarchical vision transformer foundation model tailored for SITS analysis. At its core, we introduce a spatiotemporal gyroscope attention mechanism that dynamically captures evolving multiscale patterns across both time and space. For pre-training, we curate MillionST, a large-scale dataset of one million images from 100,000 geographic locations, each captured across 10…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mililab/timo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Layer Normalization · Softmax · Residual Connection · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer