MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling

Zhiping Yu; Chenyang Liu; Jinqi Cao; Qinzhe Yang; Siwei Yu; Zhengxia Zou; Zhenwei Shi

arXiv:2605.20090·cs.CV·May 20, 2026

MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling

Zhiping Yu, Chenyang Liu, Jinqi Cao, Qinzhe Yang, Siwei Yu, Zhengxia Zou, Zhenwei Shi

PDF

1 Repo

TL;DR

MetaEarth-MM is a unified foundation model for multi-modal remote sensing image generation, enabling flexible cross-modal translation and scene-centric joint modeling across five modalities.

Contribution

It introduces a scene-centered joint modeling paradigm and a decoupled architecture for multi-modal remote sensing image generation, supported by a large-scale dataset.

Findings

01

Strong generative capability across diverse tasks

02

Robust generalization to unseen modalities

03

Supports downstream Earth observation tasks

Abstract

Multi-modal remote sensing images are vital for Earth observation, yet complete paired observations are often scarce in practice. Existing generative methods commonly address this problem through isolated pairwise modality translation, but their versatility and scalability remain limited as the number of modalities and generation tasks increases. Here, we develop a generative foundation model MetaEarth-MM for multi-modal remote sensing imagery, enabling paired joint generation and any-to-any translation across five modalities within a unified model. Recognizing the intrinsic scene consistency underlying multi-modal observations, we introduce a scene-centered joint modeling paradigm in MetaEarth-MM. Unlike previous methods that rely on direct appearance-level cross-modal mapping, our model organizes the generation around the underlying scene content. Specifically, MetaEarth-MM adopts a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YZPioneer/MetaEarth-MM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.