Uni-RS: A Spatially Faithful Unified Understanding and Generation Model for Remote Sensing
Weiyu Zhang, Yuan Hu, Yong Li, Yu Liu

TL;DR
Uni-RS is a novel remote sensing multimodal model that enhances spatial faithfulness in text-to-image generation by explicitly modeling spatial relations, addressing a key challenge in remote sensing AI.
Contribution
The paper introduces Uni-RS, a unified multimodal model with explicit spatial layout planning and spatial-aware supervision to improve spatial accuracy in remote sensing image generation.
Findings
Significant improvement in spatial faithfulness during text-to-image generation.
Maintains strong performance on understanding tasks like captioning and visual grounding.
Effective systematic spatial transformations enhance model robustness.
Abstract
Unified remote sensing multimodal models exhibit a pronounced spatial reversal curse: Although they can accurately recognize and describe object locations in images, they often fail to faithfully execute the same spatial relations during text-to-image generation, where such relations constitute core semantic information in remote sensing. Motivated by this observation, we propose Uni-RS, the first unified multimodal model tailored for remote sensing, to explicitly address the spatial asymmetry between understanding and generation. Specifically, we first introduce explicit Spatial-Layout Planning to transform textual instructions into spatial layout plans, decoupling geometric planning from visual synthesis. We then impose Spatial-Aware Query Supervision to bias learnable queries toward spatial relations explicitly specified in the instruction. Finally, we develop Image-Caption Spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Constraint Satisfaction and Optimization
