CLIPPan: Adapting CLIP as A Supervisor for Unsupervised Pansharpening
Lihua Jian, Jiabo Liu, Shaowu Wu, Lihui Chen

TL;DR
CLIPPan introduces an unsupervised full-resolution pansharpening method using a fine-tuned CLIP model with semantic language constraints, improving spectral and spatial fidelity without ground truth supervision.
Contribution
The paper presents a novel framework that adapts CLIP for unsupervised pansharpening, integrating language constraints to guide fusion learning at full resolution.
Findings
Outperforms existing methods on real-world datasets
Enhances spectral and spatial fidelity in pansharpening
Sets new state-of-the-art results for unsupervised pansharpening
Abstract
Despite remarkable advancements in supervised pansharpening neural networks, these methods face domain adaptation challenges of resolution due to the intrinsic disparity between simulated reduced-resolution training data and real-world full-resolution scenarios.To bridge this gap, we propose an unsupervised pansharpening framework, CLIPPan, that enables model training at full resolution directly by taking CLIP, a visual-language model, as a supervisor. However, directly applying CLIP to supervise pansharpening remains challenging due to its inherent bias toward natural images and limited understanding of pansharpening tasks. Therefore, we first introduce a lightweight fine-tuning pipeline that adapts CLIP to recognize low-resolution multispectral, panchromatic, and high-resolution multispectral images, as well as to understand the pansharpening process. Then, building on the adapted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Image Fusion Techniques · Advanced Image Processing Techniques · Remote-Sensing Image Classification
