Exploring Text-Guided Single Image Editing for Remote Sensing Images
Fangzhou Han, Lingyu Si, Zhizhuo Jiang, Hongwei Dong, Lamei Zhang, Yu Liu, Hao Chen, Bo Du

TL;DR
This paper introduces a novel text-guided single-image editing method for remote sensing images that operates with minimal training data, leveraging pre-trained vision-language models and prompt ensembling to improve accuracy and controllability.
Contribution
It proposes a new RSI editing approach using only one image for training, employing multi-scale training and prompt ensembling to overcome dataset limitations and semantic ambiguity.
Findings
Outperforms existing methods in CLIP scores
Achieves high subjective quality in edits
Supports practical disaster assessment tasks
Abstract
Artificial intelligence generative content (AIGC) has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning based editing methods generally involve two sequential stages: generation and editing. For natural images, these stages primarily rely on generative backbones pre-trained on large-scale benchmark datasets and text guidance facilitated by vision-language models (VLMs). However, it become less viable for RSIs: First, existing generative RSI benchmark datasets do not fully capture the diversity of RSIs, and is often inadequate for universal editing tasks. Second, the single text semantic corresponds to multiple image semantics, leading to the introduction of incorrect semantics. To solve above problems, this paper proposes a text-guided RSI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Geological Modeling and Analysis
MethodsDiffusion · Contrastive Language-Image Pre-training
