A Multimodal Transformer Approach for UAV Detection and Aerial Object Recognition Using Radar, Audio, and Video Data

Mauro Larrat; Claudomiro Sales

arXiv:2511.15312·cs.CV·November 20, 2025

A Multimodal Transformer Approach for UAV Detection and Aerial Object Recognition Using Radar, Audio, and Video Data

Mauro Larrat, Claudomiro Sales

PDF

Open Access

TL;DR

This paper introduces a multimodal Transformer model that fuses radar, video, infrared, and audio data for UAV detection and aerial object recognition, achieving high accuracy and real-time performance.

Contribution

The study presents a novel multimodal Transformer architecture that effectively integrates diverse data streams for improved UAV detection and classification.

Findings

01

Achieved macro-averaged accuracy of 0.9812 on test set

02

Demonstrated high precision and recall in distinguishing drones

03

Validated real-time inference speed of 41.11 FPS

Abstract

Unmanned aerial vehicle (UAV) detection and aerial object recognition are critical for modern surveillance and security, prompting a need for robust systems that overcome limitations of single-modality approaches. This research addresses these challenges by designing and rigorously evaluating a novel multimodal Transformer model that integrates diverse data streams: radar, visual band video (RGB), infrared (IR) video, and audio. The architecture effectively fuses distinct features from each modality, leveraging the Transformer's self-attention mechanisms to learn comprehensive, complementary, and highly discriminative representations for classification. The model demonstrated exceptional performance on an independent test set, achieving macro-averaged metrics of 0.9812 accuracy, 0.9873 recall, 0.9787 precision, 0.9826 F1-score, and 0.9954 specificity. Notably, it exhibited particularly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUAV Applications and Optimization · Advanced SAR Imaging Techniques · Fire Detection and Safety Systems