Enabling Near-realtime Remote Sensing via Satellite-Ground Collaboration of Large Vision-Language Models
Zihan Li, Jiahao Yang, Yuxin Zhang, Zhe Chen, Yue Gao

TL;DR
Grace enables near-realtime remote sensing by deploying compact vision-language models on satellites and larger models on ground stations, using asynchronous retrieval and confidence-based task offloading to significantly reduce latency.
Contribution
The paper introduces Grace, a novel satellite-ground collaborative system that combines adaptive knowledge updates and confidence-based task dispatching for efficient LVLM inference in remote sensing.
Findings
Reduces average latency by 76-95% compared to existing methods.
Maintains inference accuracy while enabling near-realtime processing.
Demonstrates effectiveness with real-world satellite orbital data.
Abstract
Large vision-language models (LVLMs) have recently demonstrated great potential in remote sensing (RS) tasks (e.g., disaster monitoring) conducted by low Earth orbit (LEO) satellites. However, their deployment in real-world LEO satellite systems remains largely unexplored, hindered by limited onboard computing resources and brief satellite-ground contacts. We propose Grace, a satellite-ground collaborative system designed for near-realtime LVLM inference in RS tasks. Accordingly, we deploy compact LVLM on satellites for realtime inference, but larger ones on ground stations (GSs) to guarantee end-to-end performance. Grace is comprised of two main phases that are asynchronous satellite-GS Retrieval-Augmented Generation (RAG), and a task dispatch algorithm. Firstly, we still the knowledge archive of GS RAG to satellite archive with tailored adaptive update algorithm during limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
