Small Target Detection Based on Mask-Enhanced Attention Fusion of Visible and Infrared Remote Sensing Images
Qianqian Zhang, Xiaolong Jia, Ahmed M. Abdelmoniem, Li Zhou, and Junshe An

TL;DR
This paper introduces ESM-YOLO+ with a novel Mask-Enhanced Attention Fusion module and structural representation training to improve small target detection in visible and infrared remote sensing images, achieving high accuracy and efficiency.
Contribution
The paper presents a lightweight fusion network with innovative modules that enhance small target detection and feature discriminability without increasing inference cost.
Findings
Achieves 84.71% mAP on VEDAI dataset
Reduces model parameters by 93.6% compared to baseline
Provides real-time detection with high accuracy
Abstract
Targets in remote sensing images are usually small, weakly textured, and easily disturbed by complex backgrounds, challenging high-precision detection with general algorithms. Building on our earlier ESM-YOLO, this work presents ESM-YOLO+ as a lightweight visible infrared fusion network. To enhance detection, ESM-YOLO+ includes two key innovations. (1) A Mask-Enhanced Attention Fusion (MEAF) module fuses features at the pixel level via learnable spatial masks and spatial attention, effectively aligning RGB and infrared features, enhancing small-target representation, and alleviating cross-modal misalignment and scale heterogeneity. (2) Training-time Structural Representation (SR) enhancement provides auxiliary supervision to preserve fine-grained spatial structures during training, boosting feature discriminability without extra inference cost. Extensive experiments on the VEDAI and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Infrared Target Detection Methodologies · Advanced Image Fusion Techniques
