Few-Shot Vision-Language Reasoning for Satellite Imagery via Verifiable Rewards
Aybora Koksal, A. Aydin Alatan

TL;DR
This paper introduces a few-shot reinforcement learning framework for satellite imagery reasoning that relies on verifiable rewards, enabling effective model training with minimal annotated data across multiple remote sensing tasks.
Contribution
The authors adapt the RLVR paradigm to vision-language models for satellite imagery, eliminating the need for caption supervision and demonstrating strong performance with as few as one example.
Findings
Single-example training yields significant improvements.
128 examples match or surpass models trained on thousands.
Method shows robust generalization across tasks.
Abstract
Recent advances in large language and vision-language models have enabled strong reasoning capabilities, yet they remain impractical for specialized domains like remote sensing, where annotated data is scarce and expensive. We present the first few-shot reinforcement learning with verifiable reward (RLVR) framework for satellite imagery that eliminates the need for caption supervision--relying solely on lightweight, rule-based binary or IoU-based rewards. Adapting the "1-shot RLVR" paradigm from language models to vision-language models, we employ policy-gradient optimization with as few as one curated example to align model outputs for satellite reasoning tasks. Comprehensive experiments across multiple remote sensing benchmarks--including classification, visual question answering, and grounding--show that even a single example yields substantial improvements over the base model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
