SAMChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Small Scale Remote Sensing
Aybora Koksal, A. Aydin Alatan

TL;DR
SAMChat is a lightweight multimodal model tailored for remote sensing imagery analysis, utilizing chain-of-thought reasoning and GRPO to improve detection of military sites with high accuracy and interpretability.
Contribution
The paper introduces SAMChat, a resource-efficient multimodal model with specialized dataset, chain-of-thought reasoning, and GRPO, advancing remote sensing analysis in resource-constrained settings.
Findings
Achieved over 80% recall and 98% precision on SAMData benchmark.
Outperformed larger general-purpose models in captioning and classification tasks.
Demonstrated effectiveness of fine-tuning and reinforcement learning in domain-specific applications.
Abstract
Remarkable capabilities in understanding and generating text-image content have been demonstrated by recent advancements in multimodal large language models (MLLMs). However, their effectiveness in specialized domains-particularly those requiring resource-efficient and domain-specific adaptations-has remained limited. In this work, a lightweight multimodal language model termed SAMChat is introduced, specifically adapted to analyze remote sensing imagery in secluded areas, including challenging missile launch sites. A new dataset, SAMData, was compiled by verifying hundreds of aerial images through expert review, and subtle military installations were highlighted via detailed captions. Supervised fine-tuning on a 2B parameter open-source MLLM with chain-of-thought (CoT) reasoning annotations was performed, enabling more accurate and interpretable explanations. Additionally, Group…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
