TerraMind: Large-Scale Generative Multimodality for Earth Observation
Johannes Jakubik, Felix Yang, Benedikt Blumenstiel, Erik Scheurer, Rocco Sedona, Stefano Maurogiovanni, Jente Bosmans, Nikolaos Dionelis, Valerio Marsocci, Niklas Kopp, Rahul Ramachandran, Paolo Fraccaro, Thomas Brunschwiler, Gabriele Cavallaro, Juan Bernabe-Moreno

TL;DR
TerraMind is a pioneering large-scale multimodal foundation model for Earth observation that combines token-level and pixel-level data to enable zero-shot, few-shot, and data augmentation applications, achieving state-of-the-art results.
Contribution
It introduces a dual-scale early fusion approach and the novel 'Thinking-in-Modalities' capability for improved EO data analysis.
Findings
Enables zero-shot and few-shot EO applications
Achieves beyond state-of-the-art benchmark performance
Introduces 'Thinking-in-Modalities' for data augmentation
Abstract
We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraMind leverages fine-grained representations to capture critical spatial nuances. We pretrained TerraMind on nine geospatial modalities of a global, large-scale dataset. In this paper, we demonstrate that (i) TerraMind's dual-scale early fusion approach unlocks a range of zero-shot and few-shot applications for Earth observation, (ii) TerraMind introduces "Thinking-in-Modalities" (TiM) -- the capability of generating additional artificial data during finetuning and inference to improve the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ibm-esa-geospatial/TerraMind-1.0-largemodel· 2.0k dl· ♡ 152.0k dl♡ 15
- 🤗ibm-esa-geospatial/TerraMind-1.0-basemodel· 9.1k dl· ♡ 469.1k dl♡ 46
- 🤗ibm-esa-geospatial/TerraMind-1.0-Tokenizer-S2L2Amodel· 743 dl743 dl
- 🤗ibm-esa-geospatial/TerraMind-1.0-Tokenizer-S1GRDmodel· 418 dl418 dl
- 🤗ibm-esa-geospatial/TerraMind-1.0-Tokenizer-S1RTCmodel· 281 dl281 dl
- 🤗ibm-esa-geospatial/TerraMind-1.0-Tokenizer-DEMmodel· 579 dl· ♡ 1579 dl♡ 1
- 🤗ibm-esa-geospatial/TerraMind-1.0-Tokenizer-NDVImodel· 364 dl· ♡ 2364 dl♡ 2
- 🤗ibm-esa-geospatial/TerraMind-1.0-Tokenizer-LULCmodel· 1.2k dl1.2k dl
- 🤗ibm-esa-geospatial/TerraMind-1.0-tinymodel· 961 dl· ♡ 3961 dl♡ 3
- 🤗ibm-esa-geospatial/TerraMind-1.0-smallmodel· 1.6k dl· ♡ 31.6k dl♡ 3
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Geographic Information Systems Studies · Data Visualization and Analytics
