TerraGen: A Unified Multi-Task Layout Generation Framework for Remote Sensing Data Augmentation

Datao Tang; Hao Wang; Yudeng Xin; Hui Qiao; Dongsheng Jiang; Yin Li; Zhiheng Yu; and Xiangyong Cao

arXiv:2510.21391·cs.CV·October 27, 2025

TerraGen: A Unified Multi-Task Layout Generation Framework for Remote Sensing Data Augmentation

Datao Tang, Hao Wang, Yudeng Xin, Hui Qiao, Dongsheng Jiang, Yin Li, Zhiheng Yu, and Xiangyong Cao

PDF

4 Reviews

TL;DR

TerraGen is a unified framework for generating spatially controlled remote sensing images to improve multiple vision tasks, addressing the limitations of task-specific models and incorporating geographical constraints.

Contribution

It introduces a multi-task layout-to-image generation framework with a novel spatial encoding scheme and provides the first large-scale dataset for remote sensing layout generation.

Findings

01

Achieves superior image quality across tasks

02

Enhances downstream task performance significantly

03

Demonstrates robust cross-task generalization

Abstract

Remote sensing vision tasks require extensive labeled data across multiple, interconnected domains. However, current generative data augmentation frameworks are task-isolated, i.e., each vision task requires training an independent generative model, and ignores the modeling of geographical information and spatial constraints. To address these issues, we propose \textbf{TerraGen}, a unified layout-to-image generation framework that enables flexible, spatially controllable synthesis of remote sensing imagery for various high-level vision tasks, e.g., detection, segmentation, and extraction. Specifically, TerraGen introduces a geographic-spatial layout encoder that unifies bounding box and segmentation mask inputs, combined with a multi-scale injection scheme and mask-weighted loss to explicitly encode spatial constraints, from global structures to fine details. Also, we construct the…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 5

Strengths

1. TerraGen can handle multiple remote sensing tasks (object detection, segmentation, etc.) within a single model. 2. The authors constructed a dataset of 45k images with layout annotations to train and evaluate their model. 3. Extensive experiments show that TerraGen can serve as a data augmentation engine, boosting performance on downstream tasks in both full-data and few-shot settings.

Weaknesses

While the task of unified multi-task generation for remote sensing is valuable and the constructed dataset is a potential contribution, the paper has significant flaws that preclude its acceptance in its current form. My primary concerns are as follows: 1. The core technical components, i.e., a layout encoder, multi-scale feature injection, and a mask-weighted loss, are well-established adaptations of techniques from the natural image domain (e.g., GLIGEN, ControlNet, IP-Adapter). The paper pre

Reviewer 02Rating 6Confidence 3

Strengths

- Authors propose a multi-task unified architecture. Multi-tasks as mainly handled by converting the input conditions (bbox, segm map,...) into a common format. To differentiate between tasks, authors use a task encoder that generates task-specific embeddings - Authors introduce a hierarchical mechanism to inject spatial information at multiple resolutions. - Authors carry out extensive experiments, showing how TerraGen improves generation metrics compared to other models for satellite images.

Weaknesses

- Spelling mistakes in Figure 2: Dncoder - Image generation is constrained to RGB images. It is worth noting that in remote sensing, satellite images have additional channel bands and wavelength frequencies. Authors should consider satellite image generation that supports the physical satellite spectrum/channel range, resulting in more physically-plausible reconstructions.

Reviewer 03Rating 4Confidence 4

Strengths

This paper presents a well-executed and timely study on a novel problem: unified multi-task layout generation for remote sensing data. The experimental validation is thorough and compelling, convincingly demonstrating the framework's state-of-the-art performance and its significant utility as a data augmentation engine across multiple tasks and data regimes.

Weaknesses

The primary weaknesses of this paper concern the technical depth and clarity of its methodological contributions. 1.Limited Technical Innovation: While the concept of a unified multi-task framework is valuable, its core technical components, such as the layout encoder and multi-scale injection, appear to be straightforward adaptations of existing mechanisms (e.g., cross-attention, ControlNet) rather than fundamental innovations. The paper does not sufficiently justify why these specific composit

Reviewer 04Rating 4Confidence 5

Strengths

1. A layout-to-image generation framework for remote sensing imagery is proposed, 2. A remote sensing layout generation dataset is constructed.

Weaknesses

1. This paper lacks novelty. Throughout the paper, the so-called “first unified framework” essentially combines and fine-tunes existing methods (such as GLIGEN and ControlNet) for remote sensing data, lacking fundamental innovation. Most importantly, the proposed method relies solely on attention mechanisms without introducing any explicit geographical rules or knowledge, offering no fundamental improvement over existing approaches. 2. The geographic-spatial layout encoder is essentially a conca

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.