Reframing SAR Target Recognition as Visual Reasoning: A Chain-of-Thought Dataset with Multimodal LLMs
Chaoran Li, Xingguo Xu, Siyuan Mu

TL;DR
This paper introduces a novel approach to SAR target recognition by reformulating it as a multimodal reasoning task using large language models, and presents a new dataset with Chain-of-Thought reasoning to evaluate this method.
Contribution
It pioneers the use of multimodal large language models for SAR target recognition and creates a new dataset with reasoning chains to facilitate this research.
Findings
MLLMs can generate coherent inferences on SAR data
The approach improves interpretability of SAR recognition
Limitations and failure cases of MLLMs are analyzed
Abstract
In the context of Synthetic Aperture Radar (SAR) image recognition, traditional methods often struggle with the intrinsic limitations of SAR data, such as weak texture, high noise, and ambiguous object boundaries. This work explores a novel perspective by reformulating SAR target recognition as a multimodal reasoning task. We leverage multimodal large language models (MLLMs), specifically GPT-4o, to perform target classification based on SAR imagery, guided by candidate categories and enhanced with Chain-of-Thought (CoT) reasoning. A new dataset is constructed based on the FAIR-CSAR benchmark, comprising raw SAR images, structured target annotations, candidate label sets, and GPT-generated CoT reasoning chains. Experimental results show that the MLLMs are capable of generating logically coherent and interpretable inferences in most scenarios. Our analysis highlights both the strengths…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
