Uni-RS: A Spatially Faithful Unified Understanding and Generation Model for Remote Sensing

Weiyu Zhang; Yuan Hu; Yong Li; Yu Liu

arXiv:2601.17673·cs.CV·January 27, 2026

Uni-RS: A Spatially Faithful Unified Understanding and Generation Model for Remote Sensing

Weiyu Zhang, Yuan Hu, Yong Li, Yu Liu

PDF

Open Access

TL;DR

Uni-RS is a novel remote sensing multimodal model that enhances spatial faithfulness in text-to-image generation by explicitly modeling spatial relations, addressing a key challenge in remote sensing AI.

Contribution

The paper introduces Uni-RS, a unified multimodal model with explicit spatial layout planning and spatial-aware supervision to improve spatial accuracy in remote sensing image generation.

Findings

01

Significant improvement in spatial faithfulness during text-to-image generation.

02

Maintains strong performance on understanding tasks like captioning and visual grounding.

03

Effective systematic spatial transformations enhance model robustness.

Abstract

Unified remote sensing multimodal models exhibit a pronounced spatial reversal curse: Although they can accurately recognize and describe object locations in images, they often fail to faithfully execute the same spatial relations during text-to-image generation, where such relations constitute core semantic information in remote sensing. Motivated by this observation, we propose Uni-RS, the first unified multimodal model tailored for remote sensing, to explicitly address the spatial asymmetry between understanding and generation. Specifically, we first introduce explicit Spatial-Layout Planning to transform textual instructions into spatial layout plans, decoupling geometric planning from visual synthesis. We then impose Spatial-Aware Query Supervision to bias learnable queries toward spatial relations explicitly specified in the instruction. Finally, we develop Image-Caption Spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Constraint Satisfaction and Optimization