TL;DR
TiMo is a hierarchical vision transformer model designed for satellite image time series, capturing multiscale spatiotemporal patterns to improve various Earth observation tasks.
Contribution
We introduce TiMo, a novel hierarchical transformer with a spatiotemporal gyroscope attention mechanism and a large-scale pre-training dataset, enhancing SITS analysis capabilities.
Findings
TiMo outperforms state-of-the-art methods in multiple tasks.
Pre-training on MillionST improves generalization.
Effective modeling of multiscale spatiotemporal patterns.
Abstract
Satellite image time series (SITS) provide continuous observations of the Earth's surface, making them essential for applications such as environmental management and disaster assessment. However, existing spatiotemporal foundation models rely on plain vision transformers, which encode entire temporal sequences without explicitly capturing multiscale spatiotemporal relationships between land objects. This limitation hinders their effectiveness in downstream tasks. To overcome this challenge, we propose TiMo, a novel hierarchical vision transformer foundation model tailored for SITS analysis. At its core, we introduce a spatiotemporal gyroscope attention mechanism that dynamically captures evolving multiscale patterns across both time and space. For pre-training, we curate MillionST, a large-scale dataset of one million images from 100,000 geographic locations, each captured across 10…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Layer Normalization · Softmax · Residual Connection · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer
