SAR-RAG: ATR Visual Question Answering by Semantic Search, Retrieval, and MLLM Generation

David F. Ramirez; Tim Overman; Kristen Jaskie; Joe Marvin; Andreas Spanias

arXiv:2602.04712·cs.CV·May 12, 2026

SAR-RAG: ATR Visual Question Answering by Semantic Search, Retrieval, and MLLM Generation

David F. Ramirez, Tim Overman, Kristen Jaskie, Joe Marvin, Andreas Spanias

PDF

TL;DR

This paper introduces SAR-RAG, a novel retrieval-augmented generation system that enhances synthetic aperture radar target recognition by combining multimodal language models with semantic image retrieval.

Contribution

It presents a new method integrating a large language model with a semantic database to improve ATR accuracy in SAR imagery.

Findings

01

SAR-RAG improves classification accuracy over baseline methods.

02

The system enhances vehicle dimension regression performance.

03

Semantic search retrieval aids in distinguishing similar vehicle types.

Abstract

We present a visual-context image-retrieval-augmented generation (ImageRAG)- assisted AI agent for automatic target recognition (ATR) of synthetic aperture radar (SAR) imagery. SAR is a remote sensing method used in defense and security applications to detect and monitor the positions of military vehicles, which may appear indistinguishable in images. Researchers have extensively studied SAR ATR to improve the differentiation and identification of vehicle types, characteristics, and measurements. Test examples can be compared with known vehicle target types to improve recognition tasks. New methods enhance the capabilities of neural networks, transformer attention, and multimodal large language models. An agentic AI method may be developed to utilize a defined set of tools, such as searching through a library of similar examples. Our proposed method, SAR Retrieval-Augmented Generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.