TL;DR
LLaRS is a unified foundation model for multi-modal remote sensing image restoration and fusion, leveraging language prompts and a large-scale multi-task dataset to outperform existing models.
Contribution
The paper introduces LLaRS, the first unified multi-task model for remote sensing low-level vision, using novel alignment, mixture-of-experts, and dynamic training techniques.
Findings
LLaRS outperforms seven competitive models across tasks.
Constructed LLaRS1M dataset with over a million multi-task samples.
Parameter-efficient fine-tuning shows strong transfer and adaptation capabilities.
Abstract
Remote sensing imagery suffers from clouds, haze, noise, resolution limits, and sensor heterogeneity. Existing restoration and fusion approaches train separate models per degradation type. In this work, we present Language-conditioned Large-scale Remote Sensing restoration model (LLaRS), the first unified foundation model for multi-modal and multi-task remote sensing low-level vision. LLaRS employs Sinkhorn-Knopp optimal transport to align heterogeneous bands into semantically matched slots, routes features through three complementary mixture-of-experts layers (convolutional experts for spatial patterns, channel-mixing experts for spectral fidelity, and attention experts with low-rank adapters for global context), and stabilizes joint training via step-level dynamic weight adjustment. To train LLaRS, we construct LLaRS1M, a million-scale multi-task dataset spanning eleven restoration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
