From Clouds to Hallucinations: Atmospheric Retrieval Hijacking in Remote Sensing Vision-Language RAG

Jiaju Han; Chao Li; Chengyin Hu; Qike Zhang; Xuemeng Sun; Xin Wang; Fengyu Zhang; Xiang Chen; Yiwei Wei; Jiahuan Long; Jiujiang Guo

arXiv:2605.07273·cs.CV·May 11, 2026

From Clouds to Hallucinations: Atmospheric Retrieval Hijacking in Remote Sensing Vision-Language RAG

Jiaju Han, Chao Li, Chengyin Hu, Qike Zhang, Xuemeng Sun, Xin Wang, Fengyu Zhang, Xiang Chen, Yiwei Wei, Jiahuan Long, Jiujiang Guo

PDF

TL;DR

This paper introduces CloudWeb, a novel attack method that modifies remote sensing images with atmospheric patterns to hijack retrieval results in multimodal RAG systems, revealing vulnerabilities in evidence retrieval.

Contribution

It is the first study to explore atmospheric retrieval hijacking in remote sensing multimodal RAG, demonstrating effective manipulation of retrieval results through input image modifications.

Findings

01

CloudWeb significantly increases weather-related evidence in top retrieval results.

02

Retrieval hijacking propagates to influence downstream vision-language generation.

03

Natural-looking atmospheric modifications can undermine evidence retrieval in remote sensing RAG.

Abstract

Multimodal RAG systems increasingly rely on vision-language retrievers to ground visual queries in external textual evidence. Existing adversarial studies on RAG mainly manipulate the retrieval corpus or memory, while attacks on vision-language and remote sensing models typically target end-task predictions. Input-space threats to the evidence retrieval stage of remote sensing multimodal RAG remain underexplored. To address this gap, we introduce CloudWeb, an atmospheric retrieval hijacking attack that modifies only the input image while keeping the retriever, generator, and knowledge base fixed at deployment. CloudWeb overlays parameterized cloud- and haze-like patterns on remote sensing images and optimizes them with a retrieval-oriented objective that pulls adversarial image embeddings toward target atmospheric evidence, suppresses source-scene evidence, enforces rank separation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.