Reframing SAR Target Recognition as Visual Reasoning: A Chain-of-Thought Dataset with Multimodal LLMs

Chaoran Li; Xingguo Xu; Siyuan Mu

arXiv:2507.09535·eess.SP·July 15, 2025

Reframing SAR Target Recognition as Visual Reasoning: A Chain-of-Thought Dataset with Multimodal LLMs

Chaoran Li, Xingguo Xu, Siyuan Mu

PDF

TL;DR

This paper introduces a novel approach to SAR target recognition by reformulating it as a multimodal reasoning task using large language models, and presents a new dataset with Chain-of-Thought reasoning to evaluate this method.

Contribution

It pioneers the use of multimodal large language models for SAR target recognition and creates a new dataset with reasoning chains to facilitate this research.

Findings

01

MLLMs can generate coherent inferences on SAR data

02

The approach improves interpretability of SAR recognition

03

Limitations and failure cases of MLLMs are analyzed

Abstract

In the context of Synthetic Aperture Radar (SAR) image recognition, traditional methods often struggle with the intrinsic limitations of SAR data, such as weak texture, high noise, and ambiguous object boundaries. This work explores a novel perspective by reformulating SAR target recognition as a multimodal reasoning task. We leverage multimodal large language models (MLLMs), specifically GPT-4o, to perform target classification based on SAR imagery, guided by candidate categories and enhanced with Chain-of-Thought (CoT) reasoning. A new dataset is constructed based on the FAIR-CSAR benchmark, comprising raw SAR images, structured target annotations, candidate label sets, and GPT-generated CoT reasoning chains. Experimental results show that the MLLMs are capable of generating logically coherent and interpretable inferences in most scenarios. Our analysis highlights both the strengths…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.