Task-Driven Prompt Learning: A Joint Framework for Multi-modal Cloud Removal and Segmentation

Zaiyan Zhang; Jie Li; Shaowei Shi; Qiangqiang Yuan

arXiv:2601.12052·cs.CV·April 29, 2026

Task-Driven Prompt Learning: A Joint Framework for Multi-modal Cloud Removal and Segmentation

Zaiyan Zhang, Jie Li, Shaowei Shi, Qiangqiang Yuan

PDF

TL;DR

This paper introduces TDP-CR, a task-driven multimodal framework that jointly performs cloud removal and land-cover segmentation in remote sensing imagery, improving data utility for Earth observation.

Contribution

The paper proposes a novel Prompt-Guided Fusion mechanism and a two-phase training strategy for effective joint cloud removal and segmentation, with superior performance and efficiency.

Findings

01

TDP-CR outperforms state-of-the-art methods by 0.18 dB in PSNR.

02

Achieves 1.4% higher mIoU than multi-task competitors.

03

Uses only 15% of the parameters compared to existing models.

Abstract

Optical remote sensing imagery is indispensable for Earth observation, yet persistent cloud occlusion limits its downstream utility. Most cloud removal (CR) methods are optimized for low-level fidelity and can over-smooth textures and boundaries that are critical for analysis-ready data (ARD), leading to a mismatch between visually plausible restoration and semantic utility. To bridge this gap, we propose TDP-CR, a task-driven multimodal framework that jointly performs cloud removal and land-cover segmentation. Central to our approach is a Prompt-Guided Fusion (PGF) mechanism, which utilizes a learnable degradation prompt to encode cloud thickness and spatial uncertainty. By combining global channel context with local prompt-conditioned spatial bias, PGF adaptively integrates Synthetic Aperture Radar (SAR) information only where optical data is corrupted. We further introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.